Archive for the 'Review' Category

Review of Perl Best Practices

In my opinion, every Perl programmer needs at least these two books: the Camel, and the Dog (Perl Best Practices).

The Camel teaches you how to program Perl: the syntax and so on. But the Dog teaches you how to program Perl sanely, by recommending that you use only a subset of Perl’s syntax and abilities.

The Camel tells you that there’s more than one way to do it, which is supposed to be a strength of Perl. The Dog tells you which way is “best.” And it tells you why the other ways are worth avoiding, and in which circumstances, and how badly you can get burned if you’re not careful.

The Dog also shows you a lot of ways to do things that you might not otherwise think about. In that sense, it’s not just stripping away. It’s also adding things that you might not get elsewhere. Example: you might not know about List::Util.

The common complaint about Perl being a write-only language doesn’t have to be true if you follow the Dog’s suggestions. Perl can be a very readable language — and I mean readable for ordinary humans, so don’t think I’m just another Perl programmer insisting that something is readable when it looks like static to you. The Dog encourages readable coding. In fact, in the absence of any other coding standard, if you try to “follow the Dog” you’ll be doing a good job. (There’s really no need to invent your own coding standard, in my opinion. Just follow the Dog.)

It’s not perfect. I don’t always agree with its suggestions. For example, I think the use of “magic comments” to automagically add interactive progress bars to command-line programs is just a bad idea. (Comments should never change the behavior of a program). However, the vast majority of its suggestions have come from long experience. If you don’t agree with one of them, it’s worth following it anyway until you consider yourself expert enough to disagree with authority.

I consider this an essential Perl book. No Perl programmer should be without it.

Technorati Tags:, , ,

You might also like:

  1. You have the right to see code samples in an interview
  2. How to Break Web Software

Ubuntu on Dell Inspiron 1501

I recently bought a Dell Inspiron 1501, which I got a great deal on thanks to the fine people at DealNews. The base system was $449 shipped, and I chose to upgrade the processor to dual AMD64s. But I didn’t buy the system that came with Ubuntu pre-installed; for whatever reason, the one that came with Windows offered a special discount (normally the Windows tax for otherwise identical machines appears to be around $150, and I’m certainly not going to run Windows).

Therefore, I was not sure Ubuntu would support all the hardware. It’s the same story it’s been for as long as I’ve been using computers: hardware manufacturers withhold specifications from the Free Software world, so there is always a chance something will be a trouble. The good news is, I’ve only noticed two very minor incompatibilities out of the box.

One is that the Fn+arrow keys won’t change my screen brightness, at least under XFCE. Strangely, my ancient Dell laptop had no trouble with that. I assume the old one was a hardware-controlled feature and this one needs some software support, but I could be wrong.

The other thing is the built-in wireless card, which isn’t supported with Ubuntu 7.04’s drivers out of the box. However, I quickly found a set of drivers for the Broadcom Corporation Dell Wireless 1390 card, and was up and running shortly thereafter. The only thing I had to do after installing the drivers was press the Fn+F2 key, which turns the card on.

Otherwise everything works brilliantly.

And now for a rant: click through to that page about the drivers, and you’ll see an example of what I consider the Ubuntu sudo disease. There’s even a screenshot of someone typing sudo uname -a and using sudo to remove a file he didn’t create with sudo. I think unfortunately, Ubuntu’s policy of allowing one to run any command with sudo has created a crop of people who don’t understand what should and shouldn’t be privileged; some of them seem to believe that ’sudo’ is what you type at the beginning of every command. It completely defeats the purpose and circumvents the security gained by not running as root. For my part, when I want to administer my system, I become root, do what I need to do, and then quit again. I rarely sudo any command other than sudo su -.

But that’s just me.

Technorati Tags:, , , ,

You might also like:

  1. How to set up dual monitors in Ubuntu on Dell Inspiron 1501
  2. Favorite USB wireless card for Ubuntu?
  3. How to set up Gentoo wireless networking on AMD64
  4. Firefox vs. Opera on slow hardware
  5. Credit card expiration dates should conform to standards

A review of MONyog

This is a review of Webyog’s MONyog MySQL monitoring tool. The product is still in beta and has some rough edges, but holds promise. Its design includes some interesting architectural decisions I think will help distinguish it from MySQL’s own Monitoring and Advisory Service.

MONyog overview

MONyog starts an HTTP listener on your machine. After you start the program, you point your Web browser at localhost:9999. Your browser talks to MONyog, which talks to one or many MySQL servers. My impression is that MONyog is meant to be a nearly zero-footprint desktop install you’ll put on your own desktop machine, and monitor remote servers. In other words, you don’t install it on the servers you’ll monitor.

It initially prompts you for connection information, which it saves for the future. After you create and select a connection, MONyog starts watching the MySQL instance. As far as I can see, it just retrieves SHOW STATUS and SHOW VARIABLES, though I believe there are plans for more in the future.

Your web browser, in turn, communicates with MONyog via AJAX. You view the data in a dashboard through Flash graphs. If you have more than one server on your screen, you see the graphs side by side in multiple columns. Here’s a screenshot of the dashboard (nothing’s really happening because this is just watching my wife’s laptop, which is idle).

The MONyog dashboard

You can also navigate through a hierarchical display of counter data that isn’t Flash-graphed, by clicking on the “Show All” in the navigation bar. For example, if you select “InnoDB Cache,” it renders each server side by side in a table, with a bar graph if the value is a percentage. The built-in rules specify warning thresholds for values, and the top-level knows when something underneath it has a warning. In the screenshot below, you can see my InnoDB cache hit ratio is only in the 92% range, which would be worrisome if this laptop were really doing any work. Notice the red dots next to the graph, and also on the far left-hand side of the screen, indicating a problem:

MONyog all counters

The counters are generated from ordinary JavaScript expressions. Anyone who can write JavaScript can customize the built-in counters or write their own code to generate the desired data.

If you’re monitoring Linux servers, you can specify SSH login information and MONyog will also display system stats like CPU usage.

Rough edges and drawbacks

As I said, things are a bit rough around the edges in places. There is no documentation at all, despite the fact that MONyog is featured on the Webyog homepage. (In fact, this article has more information than I have seen anywhere else on the web). As of yet it doesn’t fetch or expose much variety of data from MySQL. The Linux version is RPM-only; there’s not even a .tar.gz to download, and I wasn’t able to get it working on my Linux machine. The Linux version is also very out of date compared to the Windows version, which I tried to use under Wine but couldn’t.

As far as I can tell, it is based on some open-source software such as the Netscape Portable Runtime, but is not Free Software itself. This is a definite drawback for me.

Zero-install on servers

There’s no need to install anything on the servers you’re going to monitor, which is refreshing. From what I’ve heard, the MySQL Monitoring and Advisory Service (formerly code-named Merlin) actually requires you to install an agent on each server you’re watching, as well as another server which is dedicated to collecting data from the agents. Apparently these agents are anything but lightweight and unobtrusive; I have heard a few people griping about the need to install a Java runtime and a bunch of other things (see Jan’s comment below; this is just hearsay, and I perhaps shouldn’t have written this).

In contrast, MONyog is zero-install on your MySQL servers. It simply opens a connection and requests status and other information. I think this is really appealing from a simplicity and maintenance point of view. It means you can start monitoring an unlimited number of servers without touching the configuration on any of them. All your MONyog configuration is in a single place.

I’m not picking on the MySQL folks with my comments about installing an agent. It is one of my favorite gripes. Nagios and Munin want to install agents, too. I really dislike this. In the age of SSH, there should be more agent-less monitoring systems.

JavaScript object model

MONyog populates a JavaScript object hierarchy with the data it fetches. I think people will find it easy to customize the interface, especially since you can do it from within your web browser. You just click on the file you want to edit — including any of the built-in ones — and you see the code. It’s all there; there is nothing hidden from you. If you don’t know where to get started, just take a look at the built-in counters and copy/paste to make your own.

MONyog compared to innotop

If you’re wondering how MONyog compares to innotop, I think they’re quite different. MONyog seems much more counter-oriented to me; sort of like a Flash-based version of an RRD tool. I am not sure what the future will hold for it, but in my opinion it’s a little more narrowly focused at the moment.

Disregarding the different approaches, innotop’s feature set is much larger, of course; I haven’t seen any tool that even approaches innotop’s capabilities, so that should be no surprise.

Conclusion

I continue to be interested in all sorts of monitoring systems, both for MySQL and in general, and I’m trying to learn as much about them as I can. MONyog looks like the beginnings of a nice offering, though ultimately I am unlikely to use it since it’s not Free Software. The most important advantages over other monitoring systems, in my opinion, are the simplicity and ease of use, the agent-less installation, and customizability.

Technorati Tags:No Tags

You might also like:

  1. Installing innotop on FreeBSD and Gentoo
  2. How to update a GCC profile on Gentoo
  3. A look at innotop’s new features
  4. Version 0.1.146 of innotop released
  5. How to install innotop

MySQL Table Sync vs. SQLyog Job Agent

When I wrote my first article on algorithms to compare and synchronize data between MySQL tables, Webyog’s Rohit Nadhani left a comment on the article mentioning the SQLyog Job Agent, which has a similar function. Although I have been developing MySQL Table Sync essentially in isolation, I have been meaning to give SQLyog Job Agent a try. I recently did so, and then followed that up with an email conversation with Rohit. This article is about my experience using the SQLyog Job Agent from the command line, some thoughts on the algorithm as best I can deduce it, and benchmark results against MySQL Table Sync.

Description of SQLyog Job Agent

SQLyog Job Agent (SJA) is part of the SQLyog suite of tools. It is not designed as a stand-alone command-line tool, but is meant to execute jobs created by the SQLyog GUI tool. I have not yet tried the GUI, as I’m mostly a command-line user. (However, I know many people who use the GUI tool every day).

Since the jobs SJA executes are created by the GUI, there’s not much documentation for the command-line tool. It has no help output, but it wasn’t hard to create a sync .xml file by examining the included samples.

There is no changelog for SJA alone, but Rohit pointed me to the FAQ entry for SQLyog overall. The earliest mention I can find is October of 2003.

My experience using SJA

I downloaded version 5.27 of SJA on April 2, noticed some potential issues with it, and contacted Rohit to discuss those. I saw it was issuing the statements to resolve differences in a sequence that would cause problems — DELETE, INSERT, UPDATE. Indeed, I browsed the help forums and saw this order of operations was an attempt to fix problems caused by syncing in the order INSERT, UPDATE, DELETE:

We recently changed the order of operation for DATASYNC from INSERT -> DELETE -> UPDATE to DELETE -> INSERT -> UPDATE. That can be confusing (rows will be deleted and next inserted), but in the end data should come there.

There can still be problems even with the new order of operations, and I saw strange behavior in the sync jobs I ran. For example, to sync a table that was merely missing 500 rows, it was deleting 103 rows and then inserting 603, instead of just inserting 500. I asked Rohit about this, and he confirmed it was a bug that was fixed in version 5.28:

This algorithm was introduced for a very small period of time to handle “live” changes on a source table during sync of that table. Now we have changed it to update/delete/update. Update includes both INSERTs and UPDATEs. You should download 5.28 and try.

The extra phase for update is required only if you choose to delete “Extra rows from the target”. Consider this situation: After updating the target in Phase-I, we want to find out extra rows in the target. During this period, a source row changes. The target assumes that this data is “extra” and delete it from itself. So you might land up with an “non-synced” dataset most of the times in a live database.

I must have downloaded 5.27 only a few hours before 5.28 was available. In any case, I re-downloaded (it’s just over 1MB — not large) and as Rohit promised, the issues I saw were gone.

I also mentioned some other minor things I saw in the query log output, such as possibly redundant queries, and Rohit indicated those would be fixed in the next version.

SQLyog Job Agent’s sync algorithm

I would never reverse engineer a closed-source application, but peeking in the query log is fair game! I found most queries fairly straightforward. SJA finds differences with checksum queries, which appear to be inspired by Giuseppe Maxia’s work on remote database comparison in 2004. Here’s a typical query, abbreviated to fit on the page:

select  left(concat(IF(`col1`<0,'-','+'), lpad(abs(`col1`),9,'0')),4),
   concat(
      sum(conv(substring(md5(concat_ws(",",[all columns])),1,8),16,10)),
      sum(conv(substring(md5(concat_ws(",",[all columns])),9,8),16,10)),
      sum(conv(substring(md5(concat_ws(",",[all columns])),17,8),16,10)),
      sum(conv(substring(md5(concat_ws(",",[all columns])),25,8),16,10))
   )as hashkey,
   count(*)as yog_cnt, `col1`
from test2
group by 1 order by 2

Here are the first few rows resulting from that query on my test data set:

+------+------------------------------------------------------+---------+-----------+
| [..] | hashkey                                              | yog_cnt | col1      |
+------+------------------------------------------------------+---------+-----------+
| +411 | 1034880993212471840918027163413727068358             |       1 | 411149050 | 
| +100 | 10388239781124433886971298309216711174110863         |       6 | 100356640 | 
| +483 | 104843946004106862890734106456706210129920770876     |      49 | 483694780 | 
| +284 | 10504012808811182574082112225135699596455074096      |      51 | 284017580 | 
| +368 | 1054403046550107382218321910460918668291081851629911 |     489 | 368027560 | 
+------+------------------------------------------------------+---------+-----------+

As SJA finds differences between the tables, it adds WHERE clauses to the checksum query, narrowing the range of rows by limiting the upper and lower boundaries of the rows that are being checksummed. Here’s a typical WHERE clause:

where   `col1` >= 219000000 and `col1` < 220000000

In subsequent queries SJA also increases the size of the substring it takes on the first column, from 4 to 7 to 10 leftmost characters. If you ignore the sign digit, this means it is narrowing the grouping by 103 rows each time, or in other words grouping the current working set into a maximum of 1000 groups. This is very similar to the algorithm I proposed in my first article, as a fallback mechanism when the DBA cannot use an index to design a grouping strategy.

Beyond this, SJA seems to do the kinds of queries you’d expect a sync tool to issue.

Potential weaknesses in SQLyog Job Agent’s checksums

I’ve worked hard to design a very strong checksum algorithm for MySQL Table Sync to detect when rows have changed. I’ve spent many hours consulting with several experts, including someone who studied applied statistics for a PhD at MIT. After nearly a month of work on this algorithm, I was eager to see the SJA checksum algorithm.

SJA slices the base-16 checksum into four strings 8 characters long, converts those into base 10, and sums them over the group. It then concatenates the resulting four integers together to form a single string of digits. This represents the “checksum” of the group of rows.

There are a couple theoretical weaknesses with in this approach. SUM() is commutative, so the order of the rows in a group is immaterial, which is a good thing. However, it might be possible to overflow a BIGINT with the SUM() over a large group. This seems unlikely, but 8 hex digits is 32 bits, and since BIGINT math is signed for aggregate functions in MySQL (except for the bitwise functions), that leaves 31 bits of headroom, which is just over 2 billion. Lots of people have tables with more than 2 billion rows. Granted, you still wouldn’t overflow unless every value in the set was FFFFFFFF, but who knows what might happen, especially if you have many more rows.

The next potential problem is the law of large numbers. Using SUM() increases the likelihood of a collision. It changes the distribution of numbers from pseudo-random over the range to a normal distribution — the familiar bell curve. Certain numbers will be more likely to occur than others, and this likelihood increases as the set grows.

Finally, string concatenation of base-ten digits discards the most significant digits. If you convert the four sliced hex strings to base ten and they end up being 1, 2, 3, and 4, and then you concatenate them, you get 1234. But the sum of the checksum is not 1234; it is 1*1624 + 2*1616 + 3*168 + 4*160. This truncates the full 128-bit range of MD5().

Rohit responded to my concern:

Yes, there are chances of collisions. I did the math 3 years back(when I had designed the algo.). I don’t remember exact details, but the chances of collision are extremely rare. Of course, it is not as good as a “pure” MD5. In the last 3 years of selling this (a vast majority of our 8000+ paid customers use it), we have not encountered any cases where collisions have been an issue.

I believe collisions would be undetectable, but I don’t really know how SJA works inside (tangent: I keep wondering if there’s a way to use something like the accounting trick of differences divisible by nine to help see which rows are bad without doing so many grouped queries). Perhaps there is a way to know when there has been a collision. In any case, I checked the tables with MySQL Table Checksum after syncing, and they were correctly synced.

For what it’s worth, MySQL Table Sync’s algorithm doesn’t have these theoretical weaknesses.

Performance analysis of SQLyog Job Agent’s queries

Before I benchmarked SJA against MySQL Table Sync, I took a few minutes to analyze the queries it uses. It looks to me like there may be some room for optimization. There are several full table scans, some of which might be combined (for example, the initial COUNT(*) and MAX(CHARLEN()). However, these are one-off queries; the real place to optimize is in the repeated queries.

I believe the checksum query could be optimized to do only one call to MD5() per row, instead of four. I don’t think MySQL recognizes the MD5() sub-expression as something whose value can be re-used instead of calling again. Can anyone confirm this?

The GROUP BY and ORDER BY clauses also don’t use indexes. I believe a little rewriting could get them to use the primary key, which would avoid a temporary table and filesort.

Benchmark method and results

I did an informal benchmark of SQLyog Job Agent and MySQL Table Sync, whose results I should stress are not scientific. Since I already had a realistic data set for the benchmarks I ran last week, I just re-used that. You can download the sample data I used, and the full benchmark results and scripts I used to run the benchmark.

I ran the benchmark on my laptop, which is so old and slow you can literally hear the circuits make noise when there is a context switch. It has plenty of memory, but not much of anything else! I am using Ubuntu 6.10 and the MySQL version is 5.0.24a-Debian_9ubuntu2-log. One consequence of running the benchmark on a single computer is that I/O becomes sequential for both tools, whereas SJA should be able to take advantage of asynchronous I/O if it is communicating with two different servers.

I used mostly the same queries as in my previous benchmark. The table has 50,000 rows, and in subsequent tests I deleted a random 5 rows from the destination, then 500 rows, then updated 1 row, then deleted all rows where col2 = 60 (as before, it’s 11,424 rows), and finally deleted all col2=60 rows from the source instead of the destination. These are the same scenarios I ran in my earlier benchmarks, except the last which is new.

I used MySQL Query Profiler to measure the server activity and elapsed time. The file “commands.txt” included in the aforementioned download is the input I sent to mysql-query-profiler. (I used a new feature I added to the profiler, which I have not yet packaged and released).

The following is a summary of the benchmark results. SJA means SQLyog Job Agent, and MTS means MySQL Table Sync:

You can see MySQL Table Sync performs somewhat better overall on this data set, and sometimes performs much better. Overall it runs in about 28% the time, doing about 28% as much I/O and sorting only 36% as many rows. I don’t want to speculate too much, but it seems to me that most of the difference is probably the reduced I/O, with more efficient queries a distant second. I designed MySQL Table Sync to be network-efficient, so this result does not surprise me.

Miscellaneous thoughts

SJA and MySQL Table Sync are not really designed for the same purposes. Though both can sync data between remote tables, MySQL Table Sync is explicitly designed for network efficiency and guaranteed consistency when syncing, even while the server is being used. I’m not done with it yet, but it already has a variety of options a smart DBA can use to sync tables — especially on replication slaves that have become corrupt — more efficiently than a generic algorithm that applies to all table structures. As far as I know, SJA doesn’t offer these features. On the other hand, it can do a lot of things MySQL Table Sync cannot, such as sync schema differences as well as data differences. To some extent then, this comparison is apples to oranges.

For example, I’m not sure exactly how the SJA does its deletes, inserts and updates, but I believe the only order of operations that’s correct in every case is DELETE, UPDATE, INSERT. But perhaps there are other considerations when you are doing more complicated types of syncing, such as two-way syncs. I don’t know any way to guarantee a point-in-time consistent two-way sync on tables that are being written to on both servers. I suspect SJA cannot guarantee this level of consistency either. My goals are a little different; I’d prefer to do a simpler task with a guarantee of consistency than a two-way sync with potential for inconsistency (you can always run MySQL Table Sync twice to do a two-way sync).

Rohit was very kind to spend time discussing SQLyog Job Agent over email with me, and I appreciate Webyog very much for their contributions to the MySQL community and to open source, so I was careful to ask permission to write this article. That’s not something I’d normally do, but since I’m peeking into the query log of a commercial product, analyzing it, and benchmarking it I wanted to be respectful and err on the side of caution. If Rohit had expressed any discomfort with me discussing SJA I probably wouldn’t have written this. In response to me asking “Is there anything you would like me to avoid writing about,” Rohit wrote:

Nothing in particular. My only concern is that the usability should not be judged by the command line usage. Another thing that I would like to highlight is that SJA communicates with servers in different threads so the database communication time is not “added” up!

Both points are well put. I was initially surprised that there’s no command-line help for SJA, but once I understood that it’s not meant to be stand-alone, it made sense to me. And SJA’s asynchronous I/O is a very smart design; my choice of Perl as a programming language has necessarily limited, or made more difficult, these kinds of optimizations.

Conclusion

I found SQLyog Job Agent to be a well-rounded tool for syncing data between MySQL tables. Though not designed purely as a stand-alone tool, once I figured out the XML job file format, it was easy to use. My analysis showed me some areas where there’s theoretically a possibility of incorrectly syncing data, but I never observed that happening. I ran some unscientific benchmarks and found that my design for MySQL Table Sync is several times more efficient for my test case in terms of network I/O, which seems to be the major contributor to the time it takes to sync tables.

Technorati Tags:No Tags

You might also like:

  1. A progress report on MySQL Table Sync
  2. Progress on Maatkit bounty, part 2
  3. Progress on Maatkit bounty, part 3
  4. How MySQL replication got out of sync
  5. Comparison of table sync algorithms

Review of Pro Nagios 2.0 and Nagios System and Network Monitoring

Last week I read two books on Nagios. I found one easy to use and the other difficult.

The books

Cover of Nagios System and Network MonitoringNagios System and Network Monitoring (Wolfgang Barth, No Starch Press, 2006) is a delight. It explained Nagios briefly and clearly, showed me how to get it running, and continues to be a useful reference. I rate it 4 out of 5 stars.

Cover of Pro Nagios 2.0Pro Nagios 2.0 (James Turnbull, Apress, 2006) is hard for me to read, understand and use. Mr. Turnbull is clearly an expert on Nagios, but the book doesn’t communicate his knowledge effectively. I give it two stars.

Despite the differences, the books are identical in many ways. Both have chapters on the same topics. They cover exactly the same subject matter in the same level of detail, have the same target audiences, were written about the same time, are about the same size, and so on. The differences are in organization and writing style.

Mr. Barth’s book lets me do a breadth-first search of the subject matter, and Mr. Turnbull’s makes me do a depth-first search. Breadth-first is a better way to learn a topic like Nagios.

Pro Nagios 2.0 (James Turnbull)

I read James Turnbull’s Pro Nagios 2.0 first. I brought it home for the weekend, and read past page 200 — more than half the book — but I did not learn much about Nagios. Here’s why I found it hard to use:

  • It explains difficult concepts abstractly. Here are some of the first sentences from Chapter 2’s introductory paragraph:

    You do this by defining a series of objects that represent the characteristics of the environment being monitored. You begin by defining your assets to the Nagios server. Nagios calls them hosts. Then you define the attributes and functions of these assets…

    The book would have been clearer if it used the concrete Nagios metaphors (hosts, services, contacts), but it describes many things in object-oriented programming terms.

  • It’s hard to skim because it’s organized depth-first. It usually begins a topic with a few sentences, then interrupts with a tip, note or caution. Then the text resumes with specific details, and demonstrates them with code, usually a configuration sample or a shell script.

    There are few unbroken paragraphs of prose. This makes it hard to scan for headings, because they don’t stand out visually.

  • It puts unimportant material before important topics, but you don’t know that in advance. Here’s an example: if a service is okay, Nagios assumes the service’s host is too, so it’s generally unnecessary to monitor hosts. Turnbull writes about how to monitor the hosts in detail before saying it’s not important — for instance, beginning on page 39; then on page 42 he writes, “Remember: don’t schedule regular checks of your hosts!” Again in the beginning of Chapter 5, “Monitoring Hosts and Services,” the section on hosts is before the section on services.

  • Its confusing sentences and passive voice make it hard to understand. For instance,

    These directives can either be unique to the type of object being defined — for example, a host object has a directive that defines its address, or generic and applicable to a number of object definitions.

    This confuses me because I can’t immediately find the “or” that should correspond to the “either,” and when I do it seems to go with the “for example,” where it makes no sense.

  • It’s too verbose, often using a paragraph when a few words would be clearer. Here is an example from page 37:

    Table 2-2 contains four columns. The first two columns indicate the directive name and description. The third and fourth columns indicate if the directive’s value can be used as a macro and whether it is mandatory to the object definition. An object definition must have all the required mandatory directives included in the definition to be valid. An invalid object definition will result in Nagios failing to start. I’ll discuss macros in a bit and in the “Defining Commands” section later in this chapter.

    I can count the columns myself, so I don’t need the first sentence, and the columns have headings, so I don’t need the next two either. The fourth is redundant both to the column headings and to itself. The fifth is in the passive voice, and the sixth is a forward reference. The paragraph’s essence is “Nagios will not start if you omit a mandatory directive,” which states the obvious. I’d delete the whole paragraph.

  • It’s self-contradictory. For example, on page 44 the book shows how to enable “freshness” checks:

    In order to enable freshness checking, there are some other directives in the main nagios.cfg file that also need to be set. These control freshness checks for hosts. I’ve listed these directives in Example 2-8.

    Example 2-8. The nagios.cfg Freshness Items

    check_host_freshness=0
    host_freshness_check_interval=60

    This talks about enabling freshness checks and then shows the configuration setting that disables them. It makes me backtrack to be sure I read it right. This happens fairly often.

  • It interrupts with frequent references, which rarely have page or chapter numbers with them and usually introduce repeated material. Since they don’t have page numbers, you can’t find where they refer to; in one case I couldn’t find a reference even after looking for quite a while and consulting the index. Again, it’s hard to locate sections by scanning for headings, so the lack of page numbers makes the references nearly useless. Forward references are especially distracting.
  • It over-uses Note, Tip, and Caution sections. Chapter 2 has at least 47 of them in 52 pages, and there are seven in pages 220-223. They are highlighted with whitespace and two horizontal rules that span the page from edge to edge, so they’re very intrusive. Some examples: on page 59, while explaining equations that control a scheduling algorithm, a note says “ceil indicates that you need to round up the result to the nearest integer.” And on page 221, while discussing third-party tools, a note says “The NAN client and daemon are written in Perl,” which is irrelevant. Some of these would have been okay as footnotes, but many should have been omitted.

After a while I tried to skim through and read the book at a higher level, but I found it very difficult.

It’s not impossible to use, and I don’t mean to criticize too much. For example, once I was familiar with Nagios I found it useful for reference a few times. If you know what you want to find, it’s easy to navigate the book with the table of contents or the index (though the index doesn’t seem to be complete). I remembered seeing something about how to organize configuration files in various ways, and I referred back to that since Mr. Barth’s book didn’t mention it. This discussion, on pages 32-35, helped me think about how to arrange configuration files into subdirectories with the cfg_dir directive.

On the whole, I found the book a poor introduction to Nagios.

Nagios System and Network Monitoring (Wolfgang Barth)

Mr. Barth’s book succeeds through clarity, organization, detail, and ease of use. His first language is apparently German, so his English is not always elegant, but it is concise and communicates his knowledge of Nagios well.

The best thing about the book is the organization. The first part, “From Source Code to a Running Installation,” is three compact chapters on everything you need to get Nagios working. The first chapter is about compiling, installing, and basic web interface configuration. The second is an overview of general Nagios configuration — the stuff you need to actually monitor things. The third is about checking your configuration and starting Nagios.

By page 55 I understood more about Nagios than I was able to glean from Pro Nagios 2.0. When I finished the first three chapters, I had Nagios running and monitoring itself.

The second part is called “In More Detail,” and it was very helpful while I configured Nagios to monitor other servers. It was a useful reference when I looked up specific topics, and easy to skim for important instructions and pitfalls. The good introductory sentences make it easy to decide whether I need to read a section, and the uninterrupted text makes it easy to skip if I don’t. Large blocks of text make headings stand out, so there’s less need to use the table of contents. References have page numbers.

There are few footnotes, and no interruptions in the text. When there are configuration examples, the book shows the whole configuration definition for context, but makes the relevant parts bold, so they’re easy to see. There are lots of diagrams, which clearly show concepts that would require hundreds of words to explain. Captions are in the margins so they don’t interrupt the text.

I could go on, but suffice to say it’s a delight to read and use this book.

On the downside, I dislike the typeface the Barth book uses. Though I’m no expert on typefaces, I think it has a hard-on-the-eyes combination of shapes, heavy line-weights in odd places, and internal white-space. The typeface is sans-serif, so the letters are harder to recognize; for example, the “c” looks a lot like the “e.” It is difficult to speed-read. On the other hand, the text is fully justified and hyphenated (the publisher told me he believes it was typeset with LaTeX, which explains a lot — I love LaTeX). This makes it easier to read than Apress’s standard ragged-right typesetting.

The difference between the books

Let me show you the differences between the two books by comparing how they explain “flapping,” a Nagios feature to prevent a flood of notifications from services that are cycling between states.

The earliest index entry for flapping in the Turnbull book is on page 47. On pages 47 and 48, Turnbull explains the algorithm in detail:

So how does Nagios determine if a host or service is flapping? When you enable flap detection, Nagios keeps a record of the last 21 states of the host or service in an array. It then counts the number of times the states in the array have changed. This is calculated as a percentage. For example, with 21 states recorded we have a possible 20 state changes. For a normal-behaving host or service, it may be that the last recorded 21 states were OK. Hence, the percentage of states change is 0 percent. If, however, during the last 21 recorded states the host or service changed state 9 times, then the percentage of state change is 45 percent. But there is another added layer of complexity in this calculation. The states in this array are weighted — the newest state is considered 50 percent more important than the oldest state. This is because Nagios considers that the newer states are more indicative of the current behavior of the host or service than the older states.

Compare those 175 words to page 219 in the Barth book:

If a regular test shows that service or computer is changing its data continuously, this is called flapping in Nagios.

The Barth book says no more on the topic, but refers interested readers to the appendix on page 401, where it explains flapping with a nice diagram.

This illustrates the major differences between the books:

  1. The Barth book tells you where to learn more, if you want to know. The Turnbull book practically forces you to read through irrelevant details.
  2. The Barth book presents minor concepts like flapping later in the book (page 219 and 401). The Turnbull book puts the trivia in the beginning (page 47).
  3. The Barth book uses a dozen or so words, or a diagram. The Turnbull book uses hundreds of words instead.

Summary

I like Wolfgang Barth’s Nagios System and Network Monitoring better than James Turnbull’s Pro Nagios 2.0.

Subscribe to get my future articles conveniently delivered to you via email or feeds.

Technorati Tags:No Tags

You might also like:

  1. Progress on High Performance MySQL, Second Edition
  2. Progress on High Performance MySQL Backup and Recovery chapter
  3. Progress report on High Performance MySQL, Second Edition
  4. Organizing High Performance MySQL, 2nd Edition
  5. Get a free sample chapter of High Performance MySQL Second Edition

A review of Pro MySQL (Kruckenberg and Pipes, Apress 2005)

Pro MySQL, by Mike Kruckenberg and Jay Pipes, covers advanced MySQL topics, especially focusing on optimization and internals. I give it a solid 4 stars.

Overview

Pro MySQL book cover image Pro MySQL covers a lot of topics you need to know, and probably will not learn about anywhere else.

The book’s target audience is MySQL professionals. The authors assume significant depth of background in the subject area, and strive to put the finishing touches on your education — to take you from apprentice to journeyman, so to speak. I think they do a great job.

The writing is lucid and well-organized. It’s also well-edited; apparently this book was not rushed to press as so many technical books are. It’s exceptionally easy to read and understand. There are plenty of examples, but they do not interfere with the writing. Paragraphs are long enough that the pages are filled with large, solid chunks of text, without interruptions or distractions. Even footnotes are unusual. Unlike many other technical books, the page count is not inflated with endless code listings! Oh, and I don’t think I saw any typos — I can’t remember the last time I read a typo-free technical book.

Pro MySQL actually goes beyond “easy to read” and “well-organized.” In fact, it’s downright engaging. I found it so much fun to read that I put aside my wife’s Christmas present (two volumes on the history of World War II, which is super-interesting) and started carrying this book with me everywhere till I finished reading it. I read it over lunch at work, in the waiting room at the doctor’s office, and everywhere else I got a chance. It’s that interesting.

On the downside, there are a couple of places where the book restates the official manual, and a technical blip or three. However, these are few and relatively minor.

Organization

The book is organized in two parts. The first, “Design and Development,” comprises the first 13 chapters, which I would roughly group into advanced technical topics and new features in MySQL 5.0. The second part, “Administration,” covers the usual topics like installation, security and backup, but also has sections on replication and clusters.

I wanted to see how long and how many of each kind of chapter there are, so of course I threw it into a SQL table (spreadsheets are for wimps).

Note: the “section” is my own way of grouping the chapters, as I explained above. Here’s a graphical view of the book’s organization, using one of the techniques I learned from SQL Hacks:

select section,
   count(*) as num_chaps,
   avg(len) as avg_chap_len,
   repeat('#', sum(len)/15) as total_pages
from chapters
group by section
order by field(section, 'Background', 'MySQL 5', 'Administration');

You can see they devote significant space to explaining the background and theory of using MySQL (and indeed relational databases in general), but not so much to the new features in MySQL 5. This is to be expected, as MySQL 5 was brand-new at the time.

The following sections will explore these chapters in more detail.

Part 1

Part 1, “Design and Development,” has 13 chapters that pretty much fully cover the topics a well-rounded software engineer needs to master for professional MySQL usage. There are a few weak spots, but for the most part, this material is all necessary to the book — no long, boring chapters I wanted to skip to “get to the good stuff.” It’s almost all “good stuff.”

Chapter 1 might be considered skip-worthy, because it’s about team roles, why you might choose MySQL, and so forth. However, it still deserves reading because a) it’s short and b) you should be interested in what Mike and Jay have to say about it. They’re people you can learn from. They also give a very good overview of many RDBMSs, not just MySQL, and the strengths and weaknesses of each. If you’re trying to decide which products to consider for your business, you should read this section. It is very balanced, one of the best such I’ve seen.

Chapter 2 covers index concepts in significant technical detail. This is the missing link most engineers need: how indexes are built, how disks work, clustered indexes, etc. If you don’t know this material, you will make bad decisions about indexes.

Chapter 3 is about transactions, both in theory and how MySQL implements them. Topics include logging, recovery, and checkpointing. Many people consider this relevant only if you’re using the InnoDB storage engine, but in my opinion you’re pretty likely to need InnoDB for professional applications, so if you’re trying to learn what you’ll need to know, you should not only read this chapter, but come back to it after you finish the rest of the book.

Chapter 4 really digs in. It takes you on a guided tour of the MySQL system architecture, with frequent looks at tastefully abridged source code (and lots of references to where you should read more on your own). The chapter moves from an overview into resource management, storage engines and handlers, caching and memory, and so forth. It finishes up by tracing the execution path of a simple query, which is invaluable insight into how things work in the server; if you’ve ever tried to read the source code yourself, you’ll really appreciate this. MySQL is well-designed, but there’s a lot of code, and it helps a lot to have someone give such a clear overview.

Chapter 5 discusses storage engines in detail, especially MyISAM and InnoDB, which is appropriate given their popularity. For these two engines, the book explains everything down to the individual bits in the record format on disk. Other storage engines get about one page each. The chapter closes with a nine-page overview of what you need to think about when choosing a storage engine.

Chapter 6 covers benchmarking and profiling. The authors claim that these skills set professionals apart, and I agree. In fact, long before I read this book I wrote a MySQL query profiling tool, which as far as I know is still the only one of its kind (most developers seem to think it’s enough to run a query and see how long it takes).

Chapter 7, “Essential SQL,” is one of the most important chapters, but it isn’t as uniformly good as the rest of the book. It’s a bit too basic in places, and I think the authors should have assumed their readers were already trained in some of the topics they cover, such as the different types of joins. On the plus side, they open the chapter with recommendations for writing good SQL code, and I was happy to see that; this is something professionals care about, but few people advocate otherwise (the authors agree almost wholly with my own SQL coding standards, which made me smile).

This chapter also repeats some material from the official manual, and contains some technical bloopers, such as an erroneous explanation of dependent subquery optimizations on page 268, which might lead some unsuspecting programmers to write a subquery that will run longer than the half-life of hydrogen. Alas.

The more advanced parts of Chapter 7 also made me think “yes, but there’s more to it than that” a few times. On page 282, they show different ways to rewrite a “find the max row” type of query that aren’t really equivalent in all cases, but they don’t say so. Sometimes there are ways to write queries as joins when they claim it can only be done with subqueries. And on page 286, they actually give the correct answer to the situation they incorrectly analyzed on page 268. However, on the whole Chapter 7 is well worth spending a lot of time with.

Chapter 8, “SQL Scenarios,” is really good. It’s filled with the types of things Peter Zaitsev and I tend to write about — how to optimize so multiple indexes can be used, duplicate entries, and so forth. There’s a very clear explanation of the nested set model of storing hierarchical data, the best I’ve read in fact. This chapter has more code listings than others, but there’s still a high prose-to-code ratio, so your eyes won’t glaze over.

You should spend a lot of time reading chapters 7 and 8. These are probably the two chapters you’ll come back to most often, too. They’re the core of the book for most readers, in my opinion.

Chapters 9 through 13 cover features new in MySQL 5: stored procedures, functions, cursors, views and triggers. As I showed above, these chapters are quite a bit shorter on average than the others. The depth of coverage is correspondingly less. The tone tends to be a bit guarded, too; I think the authors wanted to explain the new features, but since they hadn’t been hammered on by the community very extensively yet, there just wasn’t enough practical experience to know and cover all the nuances.

Part 2

The second part of the book is about administration. While some of the material isn’t unique to the book, it is well-written and organized, making it useful anyway. Some parts cover topics I haven’t found well-presented elsewhere.

Chapter 14 is a fairly routine discussion of installation and the most basic configuration, which is at roughly the same level as Chapter 1 in the first part of the book. Chapter 15 covers user administration. The sections discuss privileges, how they work, and how to manage them from the command line and with the GUI MySQL Administrator tool (which unfortunately I have never been able to get to work for this purpose; it always hangs. Maybe I should file a bug on that). Chapter 16, “Security,” is also not terribly in-depth, but again it provides full coverage of topics a professional needs to know. Ditto Chapter 17, “Backup and Restoration.”

Chapter 18, “Replication,” gets into the more advanced topics again. There’s a lot of information about how replication really works; this can be gleaned from the manual, but it’s nicely presented here. Some of it I haven’t been able to find in the manual, for example, what each line in the master.info file means. Chapter 19, “Cluster,” is at a similar level of depth. If you want to know whether replication or clustering will solve some specific scenario, so you know whether to look into them in more detail, these chapters will probably answer that for you. They will probably not be sufficient information for you to successfully set up and manage these features, though. You will need to read the manual, and in the case of NDB Cluster, you should probably buy the book from MySQL (MySQL Clustering by Alex Davies and Harrison Fisk, MySQL Press, 2006). To be fair, at the time Pro MySQL was published, I think it was the only material on Cluster in print.

Chapter 20 is about troubleshooting, and explains where you should look and what to investigate when something goes wrong. This will help with basic trouble, but the real troubleshooting guide is the whole book — if you’ve read the rest of the book, you already know more than you’ll learn from this chapter. And finally, Chapter 21 introduces another feature new to MySQL 5.0, the INFORMATION_SCHEMA database, which is part of the SQL:2003 standard.

On the whole, Part 2 isn’t the strongest part of the book, but it’s definitely worth reading.

Summary

Pro MySQL belongs on your bookshelf if your company uses MySQL. It belongs in your hands if you are responsible for MySQL at your company.

This isn’t a typical 10-pound doorweight book full of “code listings” and screenshots. It does have a high page count, but for the most part it’s all packed full of information, most of which you either won’t find elsewhere or won’t find in any one place.

The best part of the book is the excellent writing and organization. It makes it clear and engaging. It’s an easy read, even when the material is highly technical and difficult to understand. Kudos to Mike and Jay for bucking the trend and writing a really good technical book!

Technorati Tags:No Tags

You might also like:

  1. Progress report on High Performance MySQL, Second Edition
  2. Progress on High Performance MySQL, Second Edition
  3. Organizing High Performance MySQL, 2nd Edition
  4. More progress on High Performance MySQL, Second Edition
  5. High Performance MySQL, Second Edition: Schema Optimization and Indexing

A review of O’Reilly’s SQL Hacks

I recently read O’Reilly’s SQL Hacks, by Andrew Cumming and Gordon Russell. It’s an interesting and rewarding mixture of tips and tricks for novice to expert users.

Overview

The book consists of 100 hacks, organized into 12 chapters whose topics and level roughly progress from complete novice to advanced. There are 366 pages of real content. The index is good; for example, on the topic of dates, which is one of the main things I use references for, it has dozens of entries, many pointing to the same entries but worded in a different way, which can be very helpful for such a topic.

I give it 3 out of 5 stars because there’s some really good content, but there’s also some that shouldn’t have made it past technical reviewers, and it could have been better organized and more carefully edited.

If you approach the book as the authoritative source of expertise on how to solve every problem, you’ll be disappointed (and if you don’t know any better, you might be misled into some very bad solutions to particular problems). If you approach it as the opinions and suggestions of several people, who have things to teach you but are by no means the last word on any given subject, you may get a lot out of it. I did.

Incidentally, many of the hacks appear on this and other websites, but that shouldn’t stop you from getting a copy.

There’s some really good content

Some of the hacks are great. For example, Hack #24, “Multiply Across a Result Set,” shows you how to make up for the lack of a built-in PRODUCT() aggregate function with EXP(SUM(LN(col))). It’s simple, but I never thought of it, and now I have it tucked away for easy reference when I need it. That’s worth something — I won’t know how much until I need it.

There are also some great hacks about date math, which is always a bear. Having the complex math to find the second Tuesday of the month, or the last Thursday, easily at hand is great.

These and other gems make the book worth both reading through once and owning permanently. It was worth it for me to read once for the great ideas it gave me, like #24, and so I’ll know whether it contains a solution to something I’m working on. It’s worth owning permanently because I want to have it at hand for quick reference when I’m solving problems that are tough to memorize, and don’t want to derive the solutions from first principles. Date math falls into that category.

Reviewers were asleep at the wheel

Update: I assumed too much below; perhaps the technical reviewers did raise concerns but were not listened to, or something else might have happened. I won’t change what I wrote, so you can see my thoughts as I originally wrote them, but I want to say I’ve thought better of it now.

Some of the hacks miss obvious good solutions and propose bad ones instead. Take #16, “Search for a String Across Columns.” The hack is introduced with the question “does anyone have yellow anywhere in their room?” Then it advocates bludgeoning this query to death with the following blunt instrument:

select name from bedroom
where concat(floorcolor, ceilingcolor, wallcolor) like '%yellow%'

That really makes me cringe. There are many ways to do that better, most obviously:

select name from bedroom
where 'yellow' in(floorcolor, ceilingcolor, wallcolor);

The query isn’t written to return results where someone’s room has ‘yellow-orange’ or some other partial string matching. It explicitly states that concatenating, and then using a pattern matching operator, is better because it helps you avoid errors in typing. I disagree with this cost-benefit analysis; it’s an ugly kludge that will completely defeat indexing on any system I know of (the book mentions that it “will not usually employ an index,” which is an understatement).

It also won’t work right, because it smashes values together and loses the boundaries between them, which they go on to solve (in most but not all cases, but they don’t say that) with an uglier version of the same thing, instead of writing the query right.

This trick shows up again in Hack #88, which is also better solved with standard SQL. I wrote an admonition not to use these techniques in my employer’s copy of the book. Where were the technical reviewers on these? These techniques are a DBA’s nightmare, and any reviewer decently familiar with SQL should have cried foul and presented the better ways to solve the problems.

There are some syntactic issues, too. The book shows you how to run each of the examples on several different RDBMSs, which is good, but sometimes incompatibilities they mention don’t really exist. For example, CREATE TABLE SELECT in MySQL can also accept an optional AS, so it can be compatible with other systems, but the authors don’t say that. You also don’t need to prefix literal dates with DATE in MySQL and SQL Server, so some of the date examples could be rewritten to run without change on more systems.

Sometimes standard good practices fall by the wayside. For example, Hack #24 uses the != operator, which isn’t standard SQL.

Better editing would increase its usefulness

The book isn’t very clearly organized. Sometimes topics are hidden inside other topics. For example, if you want to know how to reduce the precision of numbers and dates, you wouldn’t look for it inside a hack titled “Break it Down By Range,” which is about grouping data into buckets. Similarly, if I want to find out how to get the top row from each group, I’m not going to look in Hack #88, “Test Two Values from a Subquery.”

There’s also some repeated material. That same hack repeats Hack #83, “Find the Top n in Each Group.” Perhaps it’s just for the sake of having a realistic example to illustrate the hack. If that’s the case, I’d change my criticism to “some of the examples are about something other than the hack and distract from its purpose.”

There’s too much variability in the style of the SQL examples. There is no consistent coding style, which is an even better idea in a book than in real code. Query results are sometimes shown as fancy MSWord-style tables and sometimes as MySQL-style textual bars and dashes.

Finally, many of the best hacks aren’t really by the listed authors, but by Rudy Limeback. But his contributions aren’t edited for style and consistency, so they feel very out of place, as though they were just copied and pasted out of an email. This is fairly jarring to read. (The astute reader will have noticed that I like even technical books to be a delight to read; perhaps you won’t have the same neuroses as I).

Summary

Don’t take my criticism too strongly; the book is worth reading and then keeping, as I said. I learned a lot from it and it’s among the more valuable books on SQL I have read. In fact, given that most books on SQL are just regurgitations of freely available manuals, it’s a standout.

Technorati Tags:No Tags

You might also like:

  1. How to Break Web Software
  2. Credit card expiration dates should conform to standards
  3. Why you need to read this blog
  4. More progress on High Performance MySQL, Second Edition
  5. A review of Pro MySQL (Kruckenberg and Pipes, Apress 2005)

Version 3.0 of mysqlreport released

Daniel Nichter has released version 3.0 of mysqlreport, one of my favorite tools for quickly comprehending the overall state of a MySQL server. The new version prints out the most important information about InnoDB.

It looks like this:

 $ perl mysqlreport --innodb-only
MySQL 5.0.26-standard-l  uptime 3 9:57:51       Fri Dec  8 17:29:07 2006

__ InnoDB Buffer Pool __________________________________________________
Usage           1.25G of   1.25G  %Used: 100.00
Read ratio      0.002
Pages
  Free              1            %Total:   0.00
  Data         78.94k                     96.37 %Drty:   0.01
  Misc           2976                      3.63
  Latched           0                      0.00
Reads           3.47G   11.7k/s
  From file     6.30M    21.4/s            0.18
  Ahead Rnd    216772     0.7/s
  Ahead Sql    181211     0.6/s
Writes        811.05M    2.7k/s
Flushes         4.16M    14.1/s
Wait Free           0       0/s

__ InnoDB Lock _________________________________________________________
Waits             680     0.0/s
Current             0
Time acquiring
  Total        492478 ms
  Average         724 ms
  Max            5182 ms

As always, very helpful… just the facts, nothing more. I have 1.25 GB of buffer pool, 100% used, very small percentage of dirty pages, etc etc. You can see it all at a glance.

It pulls the data from SHOW STATUS, which means it only works on newer versions of MySQL. Those variables are available in 5.0.3 and later, if memory serves me.

Technorati Tags:No Tags

You might also like:

  1. A case study in profiling queries in MySQL
  2. Progress on High Performance MySQL, Second Edition

Permit Cookies: a Firefox extension that makes cookie whitelisting easy

I’ve been writing a lot lately about Perl, MySQL and so forth, and neglecting another topic that interests me: the Web. I’m going to finish and publish some of the drafts I have on the Web before I continue with another massive database marathon. This one is about someone who read my mind and built exactly the Firefox extension I was about to build myself: an easy way to block all cookies and allow them on a case-by-case basis without nagging.

By default, cookie whitelisting in Firefox isn’t easy

As you may know, I take a “say no by default” approach to blocking cookies on the Web. I disable all cookies by default, and only enable them on sites I want to have them (a “whitelist”).

This cookie whitelist is a bit of a pain, though. Every time I browse to a site and want to allow cookies, I have to open the cookie preferences and add the site to the whitelist. That ends up being eight clicks, and I have to type the domain too.

I could tell Firefox to ask me about every cookie, but that’s worse than no privacy at all. I don’t even know why there is such an option. It makes browsing so unpleasant, I’d rather just let everyone set cookies and be done with it. So that’s not a solution.

What I need is a non-disruptive way to 1) see when cookies aren’t allowed, and 2) allow them with just a click or two. I looked around the Mozilla extension site and didn’t find what I wanted. I wanted a bare-bones extension, and everything I saw was a complicated mess. That was quite a while ago.

I thought I’d have to write my own extension, but never got around to it. In the meantime, either someone wrote exactly what I wanted, or I got better at finding things. The Permit Cookies extension by Daniel Lindkvist is about as perfect as it gets.

How Permit Cookies works

It puts a small icon in the status bar:

Permit Cookies Status Bar

When I want to allow cookies for a site, a single click on the icon pops up a dialog with the top-level domain of the site I’m browsing already filled in:

Permit Cookies Dialog

The little “C” in the status bar turns green to indicate cookies are now allowed. Another click on that same icon later will let me change my setting for that site, to delete the setting and disallow cookies once more.

Thanks, Daniel! You made Firefox a lot easier for me to use!

Technorati Tags:No Tags

You might also like:

  1. How to guard your privacy with blacklists and whitelists
  2. How to install beautiful X11 cursors
  3. Copyright statement, privacy policy and terms of use

A review of the Glom graphical database front-end

Glom is an interesting graphical database front-end I’ve been meaning to try out for some time. Someone asked about graphical database front-ends on the #mysql IRC channel recently, and that prompted me to install Glom and learn how to use it. My overall impressions? It lands squarely in the middle of its target audience’s needs, but still has a quirk here and there. With a bit of polish it will be a fine product, and it’s already a winner over Microsoft Access and Filemaker, two similar programs with which you might be familiar. In this article I’ll walk through installing and configuring Glom, a simple database design, a quick peek under the hood, an archaeologist’s experiences using it, and give my opinions about Glom in detail.

Introduction to Glom

Glom is a GTK+ front-end to PostgreSQL, advanced open-source relational database software that is used widely for mission-critical business applications. Having a powerful relational back-end means it is built on serious technology, with all the advantages that gives. The choice of front-end technology gives it a very nice cross-platform graphical interface that runs natively on the Gnome desktop, and can run on Microsoft Windows and other platforms (though I don’t think Glom itself has been ported to anything besides GNU/Linux).

Glom lets you create simple database interfaces in a manner similar to Filemaker Pro or Microsoft Access, though it is far easier to use and has a true relational database behind it. With Glom you create the database and the interface together, not separately. Glom requires no programming — it just requires understanding your data model, which of course is a prerequisite to doing quality work in any similar product. If you know Python and want to add more functionality to your interface, you can add Python code to buttons and for generating calculated fields.

Glom has just one developer at this time, the talented Murray Cumming, who is also a developer or lead developer for several other important Free/Open-Source projects.

Installation

As I said, a message on an IRC channel spurred me to set aside time to install and use Glom. I’ve been reading articles about it for quite a while, with mounting interest each time, but never made time to try it out. I decided I would write an article about my experience, so I approached the project with the goal of doing a “typical” installation and creating some “typical” database just for fun. I searched the web for information about Glom and found very little written about it ahead of time, so I decided to try to fill in some gaps in what I did find.

Glom is packaged for installation on Ubuntu GNU/Linux, and is said to be dead simple to install. I decided not to install it on my Ubuntu laptop, and take the road slightly less travelled — I installed it on my Gentoo desktop machine instead. I did this because it seems people who install Glom on Ubuntu generally have a smooth experience, but I couldn’t find much written about installing it elsewhere.

Glom’s website has information about installing it on Gentoo by using the popular Break My Gentoo unofficial e-build repository. Though it sounds like an invitation to disaster, in fact it is just a repository of e-builds that have not yet gotten into the main Gentoo Portage tree, and you should not fear it. That said, this was my first experience using a Portage overlay. Assuming your Gentoo system has Subversion installed, and you don’t yet have any Portage overlays either, the following commands (executed as root) should do the trick:

cd /usr/local/
svn co https://svn.breakmygentoo.org/bmg-main/
echo 'PORTDIR_OVERLAY="/usr/local/bmg-main"' >> /etc/make.conf
echo dev-db/glom >> /etc/portage/package.keywords
echo dev-cpp/bakery >> /etc/portage/package.keywords
echo dev-cpp/libgdamm >> /etc/portage/package.keywords
emerge -av dev-db/glom

In words: I checked out the Break My Gentoo tree to /usr/local/bmg-main, added that directory as a Portage overlay, unmasked Glom and two of its dependencies, and then started the emerge process. I ran this on my AMD64 system on 2006-09-01 and it worked fine for me. I ended up with version 1.0.4 of Glom, 2.4.0 of bakery, and 1.3.7 of libgdamm from the Break My Gentoo tree. Everything else was installed from the standard Gentoo Portage tree.

After I added the Portage overlay directory, newer packages for much of my system became available, and I don’t want to stray outside Portage for most of my system software, so I removed the overlay from /etc/make.conf.

I already had the latest stable PostgreSQL installed from Portage, which is version 8.0.8. I decided to stay with this version instead of upgrading to version 8.1, which is considered stable on many other distributions, because I wanted to do a fairly “conservative” installation, mimicking the type of system a risk-averse business might want to build.

Configuring PostgreSQL

You need to create a database user that can create and edit databases. Glom also requires PostgreSQL to accept connections over TCP/IP, which it doesn’t do by default. If you already have PostgreSQL configured to allow remote connections, you can skip this step. Otherwise, you should follow the instructions on Glom’s website, which you can find by following a link from the Download page. Here’s what I did to set everything up correctly:

  1. I started PostgreSQL and added it to the default runlevel, so it will start automatically:
    /etc/init.d/postgresql start
    rc-update add postgresql default
  2. I added a PostgreSQL user called glom. For reasons I’ll explain in a bit, I also created a glom database:
    xaprb@tigger ~ $ su -
    root@tigger ~ # su - postgres
    postgres@tigger ~ $ createuser -P
    Enter name of user to add: glom
    Enter password for new user: 
    Enter it again: 
    Shall the new user be allowed to create databases? (y/n) y
    Shall the new user be allowed to create more new users? (y/n) y
    CREATE USER
    postgres@tigger ~ $ createdb glom
    CREATE DATABASE
    postgres@tigger ~ $ exit
    You don’t have to create the glom database, but it makes it easier to verify your user is set up correctly, as you’ll see later.
  3. As root, I edited the PostgreSQL configuration files to allow TCP/IP connections:
    root@tigger ~ # vim /var/lib/postgresql/data/postgresql.conf
    # I added the following line:
    listen_addresses = '*'
    root@tigger ~ # vim /var/lib/postgresql/data/pg_hba.conf
    # I added the following line:
    host    all         all         0.0.0.0 0.0.0.0               md5
    These steps took a careful eye; I made some mistakes at first, because I’m not that familiar with configuring PostgreSQL.
  4. I restarted PostgreSQL. It is not necessary to restart the computer:
    /etc/init.d/postgresql restart
    If you made mistakes, as I did, you may think it restarted, but it actually didn’t. I took a look at the log to see what was the matter:
    root@tigger ~ # tail /var/lib/postgresql/data/postgresql.log
    FATAL:  unrecognized configuration parameter "listen_address"
    FATAL:  unrecognized configuration parameter "listen_address"
    I needed to say listen_addresses in postgresql.conf. Unfortunately Gentoo thought PostgreSQL was already started, so trying to start or stop it failed; I had to “zap” it to proceed:
    /etc/init.d/postgresql zap

Eventually I got the user and database created, and PostgreSQL listening for TCP/IP connections.

At this point, I recommend you make sure the user you created can connect to PostgreSQL. If it can’t, you may have misconfigured PostgreSQL, created the user wrong, or had some other trouble. To verify all is well, connect via the command-line client:

xaprb@tigger ~ $ psql -h localhost -W -U glom
Password: 
Welcome to psql 8.0.8, the PostgreSQL interactive terminal.

If you see the welcome message, everything is fine. Explicitly specifying -h localhost causes psql to connect via TCP/IP, so this is a good way to verify your configuration is correct. This is why I created a glom database before, by the way. If I didn’t do that, my connection attempt would be rejected because it automatically tries to connect to a database named the same as the user. Maybe a PostgreSQL expert can write in a better way to do this, but I’m still new at this, so I just took the simple route.

Starting Glom

Once the configuration is all done, you’re ready to start Glom. I recommend you do this by opening a terminal and typing glom, so you have a place to see any debugging output, instead of starting it from a system menu. This helped me troubleshoot configuration issues.

In Xfce, my preferred graphical environment, Glom appears under the “Office” entry in the desktop menu. I don’t know where it is in Gnome or KDE.

If all is well, Glom will start and ask you if you want to create a new database, create a new database from a sample, or open an existing database.

Configuration troubles

At this point, I ran into some troubles. The first issue was Glom couldn’t connect to PostgreSQL, even though I could connect on the command-line. Fortunately a quick web search turned up the issue: libgda, which Glom uses as a database abstraction library, wasn’t compiled with support for PostgreSQL. This is my fault. I have the postgresql USE flag turned off globally in /etc/make.conf, to avoid building extra dependencies for many packages. I added the USE flag and re-compiled the library:

root@tigger ~ # echo gnome-extra/libgda postgres >> /etc/portage/package.use
root@tigger ~ # emerge libgda

This solved the connection issues. Now I could log in as the glom user I created before, and I was able to create a new database, but I couldn’t switch Glom from Operator mode to Developer mode. I could create an empty database, but pretty much nothing else. I couldn’t create tables, and many menus and menu entries were grayed out. When I tried to use the User Level menu to switch from Operator to Developer mode, Glom displayed an error dialog saying “Developer mode not available. Developer mode is not available. Check that you have sufficient database access rights and that the glom file is not read-only.” Here’s a screenshot:

This also turned out to be a minor configuration issue, but Glom’s documentation is a bit thin at the time of writing, so it took me a while to understand what was wrong. Fortunately, because I’d started Glom from a terminal, I was also able to see some debugging output, which gave me a hint:

DEBUG: User=glom is _not_ in the developer group on the server.

My first attempt to solve this was just to add a developer group in PostgreSQL, and add the glom user to the group:

xaprb@tigger ~ $ psql -h localhost -W -U glom
   [ snipped  some output ]
glom=# create group developer;
CREATE GROUP
glom=# alter group developer add user glom;
ALTER GROUP
glom=# \q

This didn’t solve the problem. I dropped the developer group to clean up the clutter I’d just created, and tried web searches. I saw some messages on the Glom mailing list about the same problem, starting with this thread about problems with Glom not letting the user switch to Developer mode. According to the message and its follow-ups, Glom wanted my user to belong to a group called glom-developer, not just developer. This still didn’t solve the issue, though!

I was getting a bit frustrated with the lack of documentation, and was just about to go read the source code and find the problem, but I stumbled upon another message on the mailing list. This message said the issues could be solved by creating a new database from the provided sample file, which would set up any needed groups in PostgreSQL. I opened the example, which was in /usr/share/glom/doc/examples/, and it added a group called glom_developer (underscore, not hyphen). This fixed the problem at last.

Though this was a minor issue, it really was annoying. I’ve since edited the installation documentation to indicate the required group memberships. If you want to configure PostgreSQL correctly without using the example database, run the following from within psql:

glom=# create group glom_developer;
CREATE GROUP
glom=# alter group glom_developer add user glom;
ALTER GROUP

This is for PostgreSQL prior to version 8.1, which has a significantly different permission system; I imagine in 8.1 you would say CREATE ROLE... ALTER ROLE, though I’ve not tried it.

Creating a simple database

Finally, I was really ready to create a simple database project. I decided to create a database that would keep track of servers and software in a corporate network. I wanted to store information about servers, software, and which software is installed on which servers. When Glom showed me the initial screen, I chose “New,” typed the name of the XML file defining the database (”servertest”), and accepted the default database title (”Servertest”). Glom next presented me with a login dialog, asking me to connect to PostgreSQL:

Glom screenshot: connecting to the database

It filled the username field with my Unix username, not “glom,” so I changed that to “glom” and typed the PostgreSQL password. The next screen I saw was the “Tables in database” dialog, which showed no tables. I created three tables: Server, Program, and Installation. This screen doesn’t allow you to define columns for tables; all you can do is create named tables:

Glom screenshot: tables in database

After I closed this dialog, Glom displayed the Installation table in list view, I guess because it’s the first table in alphabetical order:

Glom screenshot: table contents in list mode

Glom creates three columns for every table by default: description, comments, and an auto-incrementing integer ID field. I decided not to change the Server and Program tables, and just accepted the defaults, but I needed to record which machine and program an installation represents. I chose the Developer->Fields menu to edit the Installation table, and added columns called server and program:

Glom screenshot: defining fields in a table

Then I switched to the Details view of the Installation table, which will eventually allow me to enter information about where a program is installed:

Glom screenshot: table contents in details mode

This isn’t very useful for entering data as it is. I want a pull-down menu of servers and programs, and I want those fields first in the display, before the Description and Comments fields. To create the pull-down menus, I first defined relationships between the tables. I selected the “Developer->Relationships for this table” menu entry, which showed the Relationships dialog. I added relationships from my newly added columns to the ID columns of the Program and Server tables:

Glom screenshot: defining relationships

Next I chose the Developer->Layout menu, which shows the fields and some controls for formatting and editing them:

Glom screenshot: layout mode

I selected the server field and clicked the Formatting button to control how it’s presented. I chose “Use custom formatting” and selected “Choices from Related Records,” then chose the Installation_To_Server relationship. I also chose to restrict the data to these choices, so a user can’t enter a server that doesn’t exist:

Glom screenshot: defining field formatting

I repeated the process for the program field. Next, I brought those fields up to the front of the display, by selecting them in the Layout dialog and using the Up buttons. After I accepted these changes, the Details view looks much more usable:

Glom screenshot: details mode after changing field formatting

If I had not restricted the data to the choices in the related records, the pull-down menus would have been combo boxes, and if I had selected the checkbox in the Relationships dialog to allow editing related records, changes made by typing in the combo boxes could