Archive for September, 2007

How to measure MySQL slave lag accurately

Kevin Burton wrote recently about why SHOW SLAVE STATUS is really not a good way to monitor how far behind your slave servers are, and how slave network timeouts can mess up the slave lag. I’d like to chime in and say this is exactly why I thought Jeremy Cole’s MySQL Heartbeat script was such a natural fit for the MySQL Toolkit. It measures slave lag in a “show me the money” way: it looks for the effects of up-to-date replication, rather than asking the slave how far behind it thinks it is.

The slave doesn’t even need to be running. In fact, the tool doesn’t use SHOW SLAVE STATUS at all. This has lots of advantages: for example, it tells you how far the slave lags behind the ultimate master, no matter how deep in the replication daisy-chain it is. In other words, unlike SHOW SLAVE STATUS, it won’t tell you a slave is up-to-date just because it’s caught up to its master. If a slave’s master is an hour behind, it will report that the slave is an hour behind, too — because it is.

It’s a really smart approach. And you can daemonize it, and it’ll keep a file up-to-date with running averages (by default it averages the last one, five and fifteen minutes, but of course you can choose that). Now your monitoring scripts can be as simple as “cat /var/log/slave-delay” or some such.

It’s not a hard tool to write, and I suspect lots of people have done it, but I bet that between Jeremy, whoever worked on it at Six Apart, and me, we’ve produced a pretty good version of the tool. It’s part of the MySQL Toolkit, and the full manual is online.

Technorati Tags:, , , , , , ,

You might also like:

  1. MySQL Toolkit version 896 released
  2. How to sync tables in master-master MySQL replication
  3. Why MySQL says the server is not configured as a slave
  4. Introducing MySQL Slave Delay
  5. How to know if a MySQL slave is identical to its master

Introducing MySQL Parallel Dump

A while ago Peter Zaitsev wrote about his wishes for mysqldump. These included multi-threaded dumps and “safe” dumps that would wait for a server to restart if it crashed, then keep dumping other tables. I’ve had sketches of this done for a while, but during this week I fleshed it out while writing about backup and recovery for our upcoming book. I had my own list of features I wanted to add:

  • Support for backup sets, with the backup configuration stored in the database itself.
  • Emphasis on tab-delimited dumps.
  • Sane defaults, focused on ease of use and ease of recovery.
  • Support for compression by default.

The resulting script is satisfactory to me. If you just run it without arguments, it connects to the server mentioned in your .my.cnf file and dumps all databases and tables, one file per table, gzipped, in parallel (at least two, but by default it detects the number of CPUs and runs that many in parallel).

baron@kanga $ mysql-parallel-dump
SET     DATABASE TABLE                      TIME STATUS THREADS
default mysql    columns_priv                  0      0       2
default mysql    db                            0      0       2
default mysql    help_category                 0      0       2
default mysql    func                          0      0       2
default mysql    help_keyword                  0      0       2
...snip...
default test     t1                            0      0       2
default test     t2                            0      0       1

You can tell it to dump elsewhere, and it’s easy to dump all tables in tab-delimited format. Here it’s reading its configuration from the database, writing to /tmp, and not backing up tables that have been dumped in the last 5 minutes:

baron@kanga $ mysql-parallel-dump --basedir /tmp --tab --sets set1 \
    --settable test.backupset --age 5m
Nothing to do for set set1
baron@kanga $ ls -lR /tmp/set1
/tmp/set1:
total 8
-rw-rw-rw- 1 baron baron   40 2007-09-30 21:43 00_master_data.sql
drwxrwxrwx 2 baron baron 4096 2007-09-30 21:43 test

/tmp/set1/test:
total 16
-rw-rw-rw- 1 baron baron 549 2007-09-30 21:43 t1.sql.gz
-rw-rw-rw- 1 baron baron  31 2007-09-30 21:43 t1.txt.gz
-rw-rw-rw- 1 baron baron 550 2007-09-30 21:43 t2.sql.gz
-rw-rw-rw- 1 baron baron  29 2007-09-30 21:43 t2.txt.gz

And as you can see, it knows I’ve dumped those tables recently and didn’t do them again. Pretty handy for scheduling and resuming backups, no? It makes it easy to keep going if something happens in the middle of the backup and you want to restart.

I’m aware of the similar mysqlpdump script, and I generally don’t like duplicating other people’s efforts, but I decided to go ahead and finish what I’d started. To tell you the truth, neither script is complicated. It’s just a matter of providing a sensible wrapper around existing functionality (in my case, that’s mysqldump and SELECT INTO OUTFILE, which I do directly rather than asking mysqldump to do it with -T, which just makes mysqldump into the same kind of wrapper). I also wanted to provide it as part of the MySQL Toolkit, so it’s all in the same place. Frankly, I also built in a lot more functionality than mysqlpdump has, and I consider the defaults to be more useful. I’d love for mysqldump itself to have better defaults — especially for dumping large datasets, which it’s frankly pretty poor at right now. One of these days MySQL AB will make me obsolete, I just know it…

Oh, and in keeping with my tradition, it’s sort of ridiculously sophisticated and overly generic. It has a little macro language that you can use to basically turn it into a loop iterator over the selected databases and tables, and run any command you wish. Here’s an example:

mysql-parallel-dump -- mysqldump --skip-lock-tables '%D' '%N' \| gzip --fast -c - \> '%D.%N.gz'

That basically duplicates the built-in defaults (except the defaults are actually a lot more complicated than that). But it illustrates how you could use this as a shell to select which tables to dump and fork off sub-processes, handling all the locking, error checking, and so forth for them. Here I’m spawning off mysqldump, but it would be just as easy to execute a custom script.

There’s one more wish Peter and I both have, but which is impossible for right now as far as we know. That’s to do parallel SELECT INTO OUTFILE dumps for a bunch of tables in one transaction. This will not be possible until more than one connection can participate in a single transaction. Ask the MySQL guys about that one!

This script is part of MySQL Toolkit and will be released as soon as I have time. There are a few other bug fixes I want to include. In the meantime, if you’re dying to get it, you can grab it from the MySQL Toolkit subversion repository.

Technorati Tags:, , , , , , ,

You might also like:

  1. MySQL Toolkit version 946 released
  2. Introducing MySQL Parallel Restore
  3. Progress on High Performance MySQL Backup and Recovery chapter
  4. MySQL Toolkit version 1254 released
  5. Introducing MySQL Deadlock Logger

MySQL Toolkit version 896 released

Download MySQL Toolkit

This release of MySQL Toolkit adds a new tool, fixes some minor bugs, and adds new functionality to several of the tools.

New tool: MySQL Heartbeat

This tool was contributed by Proven Scaling’s Jeremy Cole and Six Apart. It measures replication delay on a slave, which can be daisy-chained to any depth. It does not rely on SHOW SLAVE STATUS, and in fact it doesn’t even need the slave processes to be running. You could use it to measure replication delay on your own hand-rolled replication, if you wanted.

The most common way to use it is to run one process to update a heartbeat on the master, and another to monitor the lag on a slave (you can run as many as you wish to monitor multiple slaves). By default it prints moving averages of delay over one, five and fifteen-minute time windows:

   0s [  0.00s,  0.00s,  0.00s ]
   0s [  0.00s,  0.00s,  0.00s ]
   1s [  0.02s,  0.00s,  0.00s ]
   2s [  0.05s,  0.01s,  0.00s ]
   3s [  0.10s,  0.02s,  0.01s ]
   4s [  0.17s,  0.03s,  0.01s ]
   0s [  0.17s,  0.03s,  0.01s ]
   0s [  0.17s,  0.03s,  0.01s ]
   0s [  0.17s,  0.03s,  0.01s ]

(of course, I couldn’t resist making that configurable, so you can specify your own time windows).

You can also run it as a daemon. Running the update process as a daemon is intuitive. Running the monitoring process isn’t quite as obvious, because a daemon should re-open STDOUT to /dev/null. What you can do is give it the –file argument and it’ll keep a file current with the most recent line of output, which you can check anytime you want to see how your slave has been doing over the last X time windows.

Changelog

Here’s a changelog for the other tools I updated in this release:

Changelog for mysql-deadlock-logger:

2007-09-20: version 1.0.4

   * Added --interval, --time, and --daemonize options, and signal handling.
   * --askpass did not allow different passwords on --source and --dest.

Changelog for mysql-duplicate-key-checker:

2007-09-20: version 1.1.1

   * Exit code wasn't always defined.

Changelog for mysql-query-profiler:

2007-09-20: version 1.1.5

   * Documentation didn't specify how queries in FILE are separated.

Changelog for mysql-slave-delay:

2007-09-20: version 1.0.1

   * Added a --daemonize option to detach from the shell and run in the background.

Changelog for mysql-slave-restart:

2007-09-20: version 1.0.1

   * Added a --daemonize option to detach from the shell and run in the background.

Changelog for mysql-table-checksum:

2007-09-20: version 1.1.15

   * The CHECKSUM strategy was always disabled.

Changelog for mysql-visual-explain:

2007-09-20: version 1.0.3

   * filesort wasn't applied to the first non-constant table.
Technorati Tags:, , , , , ,

You might also like:

  1. How to measure MySQL slave lag accurately
  2. Maatkit version 1877 released
  3. Introducing MySQL Slave Delay
  4. MySQL Toolkit version 815 released
  5. Maatkit version 1508 released

High Performance MySQL, Second Edition: Backup and Recovery

Progress on High Performance MySQL, Second Edition is coming along nicely. You have probably noticed the lack of epic multi-part articles on this blog lately — that’s because I’m spending most of my spare time on the book. At this point, we have significant work done on some of the hardest chapters, like Schema Optimization and Query Optimization. I’ve been deep in the guts of those hard optimization chapters for a while now, so I decided to venture into lighter territory: Backup and Recovery, which is one of the few chapters we planned to “revise and expand” from the first edition, rather than completely writing from scratch.

Since we decided to take that approach, I began by following the outline from the first edition, and figured I’d re-read the first edition’s chapter and re-outline, then add more material as appropriate. To my surprise, I found this chapter in the first edition is one of the most cursory (I don’t mean to criticize too much — you’ll see where I’m going with this in a second). It’s quite short and doesn’t really discuss recovery at all, despite the chapter title. There’s one sub-section titled “Recovery,” but it’s only a few paragraphs, and mostly discusses dumping, not recovery! [Edit: whoops, I see each subsection in the “Tools and Techniques” has a few words about how to restore backups created with that specific tool. But there’s still not much general advice about how to restore backups.]

The chapter devotes a lot of space to code listings and such, and not enough on how to do high-performance backups in a high-performance application, in my opinion. I quickly decided it needs to be significantly expanded, not just updated, and I scrapped the original text and became more liberal with the outline. I’m referring to the first edition as I write, but I’m not keeping any of the text. Chalk it up to perfectionism.

The outline, as I have it so far, is as follows. If you compare it to the first edition, you’ll see I’ve rearranged it quite a bit:

1  Why Backups?
   (very brief, even more so than the first edition)
2 Considerations and Tradeoffs
   2.1 How Much Can You Afford to Lose?
   2.2 Online or Offline?
   2.3 Dump or Raw Backup?
   2.4 Onsite or Offsite?
   2.5 What to Back Up
   2.6 Storage Engines and Consistency
   2.7 Replication
3 Restoring from a Backup
   3.1 Copying Files Across the Network
   3.2 Starting MySQL
   3.3 Point-In-Time Recovery
4 Tools and Techniques
   4.1 mysqldump
   4.2 mysqlhotcopy
   4.3 Zmanda Recovery Manager
   4.4 InnoDB Hot Backup
   4.5 Offline Backups
   4.6 Filesystem Snapshots
   4.7 MySQL Global Hot Backup
   4.8 Automating and Scripting Backups
5 Rolling Your Own Backup Script

At this point, I have written sections 1, 2 and 3, which are about 11 pages in OpenOffice.org (compare to 6 pages on paper in the first edition). I’m sure this will only grow as other things occur to me. The outline of section 4 is completely open to change, and section 5 might not even happen; if you can script, you can script. Otherwise, you might want to use one of the tools listed in section 4. All in all, I’d say we’re looking at about 25 to 30 pages, just based on what’s in my head and not yet written down.

Now, to come to my point: what would be helpful to you? Are there any challenges you’d like me to cover, such as how you back up a data warehouse with terabytes of data? (I’ve already done that, in What To Back Up, but feel free to ask anyway.) Are there challenges you have had to solve, which you think would be very helpful to others? This chapter is largely open to suggestion at this point. If you tell me/us what you’d like to see, this is your opportunity to get at least four experts to solve your problems in-depth.

The usual disclaimers apply: no guarantees, this is all open to change, this is top-secret pre-production material anyway and you never saw this web page. What is the first rule of Fight Club, again?

I’m looking forward to your feedback.

Technorati Tags:, , , ,

You might also like:

  1. Progress on High Performance MySQL Backup and Recovery chapter
  2. High Performance MySQL, Second Edition: Advanced SQL Functionality
  3. High Performance MySQL, Second Edition: Query Performance Optimization
  4. High Performance MySQL, Second Edition: Replication, Scaling and High Availability
  5. Progress on High Performance MySQL, Second Edition

Version 1.5.2 of the innotop MySQL monitor released

Download innotop

This release is part of the unstable 1.5 branch. Its features will ultimately go into the stable 1.6 branch. You can download it from the innotop-devel package.

The major change is I’ve ripped out the W (Lock Waits) mode and enabled innotop to discover not only what a transaction is waiting for, but what it holds too. The new mode that replaces W is L (Locks). My last article goes into more detail on this.

Technorati Tags:, , , , ,

You might also like:

  1. How to debug InnoDB lock waits
  2. How to monitor InnoDB lock waits
  3. Version 1.6.0 of the innotop monitor for MySQL released
  4. Version 1.5.1 of the innotop MySQL monitor released
  5. A look at innotop’s new features

How to debug InnoDB lock waits

This article shows you how to use a little-known InnoDB feature to find out what is holding the lock for which an InnoDB transaction is waiting. I then show you how to use an undocumented feature to make this even easier with innotop.

Background

One of the most common complaints I’ve heard from DBAs used to other database servers is “I can’t find out who holds the locks that are blocking all these connections and making them time out.” I feel your pain. Before I helped scale my employer’s systems to deal with larger volumes of data, InnoDB lock contention was a serious issue. And as far as I knew, you couldn’t find out who was holding locks. I knew you could see who was waiting for locks to be granted; that’s easy. You just run SHOW INNODB STATUS and look for the following text:

------------
TRANSACTIONS
------------
Trx id counter 0 4874
Purge done for trx's n:o < 0 4869 undo n:o < 0 0
History list length 21
Total number of lock structs in row lock hash table 2
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0 4873, ACTIVE 6 sec, process no 7142, OS thread id 1141152064 starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 368
MySQL thread id 9, query id 173 localhost root Sending data
select * from t1 for update
——- TRX HAS BEEN WAITING 6 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 9 page no 3 n bits 72 index `PRIMARY` of table `test/t1` trx id 0 4873 lock_mode X waiting
…

That’s fine, but who holds the lock? I thought there was no way to find that out.

InnoDB Lock Monitor

Until I learned about the InnoDB Lock Monitor, that is. You enable it by running the following command:

CREATE TABLE innodb_lock_monitor(a int) ENGINE=INNODB;

It’s quite an ugly hack, but it turns out the table name is actually “magical.” It’s a special table name that tells InnoDB to start the lock monitor. You can stop it by dropping the table again.

This little-noticed feature makes InnoDB print out a slightly modified version of what you see with SHOW INNODB STATUS. The “slight modification” is to print out not only the locks the transaction waits for, but also those it holds. For example, here’s the transaction that holds the locks:

---TRANSACTION 0 4872, ACTIVE 32 sec, process no 7142, OS thread id 1141287232
2 lock struct(s), heap size 368
MySQL thread id 8, query id 164 localhost root
TABLE LOCK table `test/t1` trx id 0 4872 lock mode IX
RECORD LOCKS space id 9 page no 3 n bits 72 index `PRIMARY` of table `test/t1` trx id 0 4872 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

Record lock, heap no 2 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 4; hex 80000001; asc     ;; 1: len 6; hex 000000000d35; asc      5;; 2: len 7; hex 800000002d0110; asc     -  ;;

That’s fine, but there are, ah, limitations. As the manual says, InnoDB periodically prints out this text — essentially spewing InnoDB’s guts — to its standard output. This gets redirected to the server error log in any sane installation. Who’s looking there? And it gets printed out at long intervals, which seems to be about every 16 seconds on the machines I use.

Plus, if you’ve looked at the result, you’ll understand this is not something you want to search through manually looking for data. The output can be absolutely huge. What DBA wants to pore over thousands of hex-dumped rows from the table just to answer the question “who holds that lock?”

All in all, this is not very convenient (yep, I know that’s an understatement).

Slightly more convenient

What’s a little more convenient than combing through all that text by hand is writing a program to parse InnoDB’s status output. You don’t have to, though. That’s what I wrote innotop to do. And I’ve just released version 1.5.2, which at long last has the ability to watch a log file as well as connecting to server(s).

Here’s how this works: you start innotop, and press the L key to switch to Lock mode. This replaces the old Lock Wait mode, which was only able to monitor the InnoDB lock waits you see in the normal output of SHOW INNODB STATUS.

This mode shows you something like the following:

_____________________________ InnoDB Locks __________________________
CXN   ID  Type    Waiting  Wait   Active  Mode  DB    Table  Index
file  12  RECORD        1  00:10   00:10  X     test  t1     PRIMARY
file  12  TABLE         0  00:10   00:10  IX    test  t1
file  12  RECORD        1  00:10   00:10  X     test  t1     PRIMARY
file  11  TABLE         0  00:00   00:25  IX    test  t1
file  11  RECORD        0  00:00   00:25  X     test  t1     PRIMARY

That’s helpful! I can see the locks held and waited for in a nice tabular format. It’s pretty easy to see connection 11 is blocking connection 12.

This is still pretty inconvenient, though. To get access to the server’s error log, I have to run innotop on the database server machine itself. Is there a better way?

Even better

There is, in fact, but I discovered it completely by accident. It’s not documented, but the extra information doesn’t just get printed to the server log. It also shows up in SHOW INNODB STATUS! Now that’s a nice surprise. It means innotop can get lock information from a normal connection instead of monitoring a log file.

After discovering this, I immediately added some more features to innotop. There are now hot-keys in L mode to enable and disable the lock monitor. Now you can press L, press the ‘a’ key to start the lock monitor, see what’s blocking the waiting transaction, press ‘o’ to stop the lock monitor, and you’re done.

Best yet

I’m sure you InnoDB administrators already recognize what an improvement this is over the options you previously had (essentially, you didn’t have any). There’s still a long way to go, though. Locks could be in the INFORMATION_SCHEMA or in a SHOW LOCKS command. I won’t speculate on why they aren’t already.

Of course, the upcoming Falcon storage engine already has better features for debugging lock contention than this. But my guess is it’ll be a long time before Falcon has the market share InnoDB has. All things considered, InnoDB is a pretty nice piece of software.

Conclusion

Download innotop

The conclusion to this whole article is: use innotop if you use InnoDB. Heck, use it if you use MySQL at all. It makes a lot of things a lot easier, not just debugging InnoDB lock contention. Feedback is welcome — just use the Sourceforge bug tracker, forums, and mailing lists.

Technorati Tags:, , , , ,

You might also like:

  1. How to monitor InnoDB lock waits
  2. How I patched InnoDB to show locks held
  3. How to find out who is locking a table in MySQL
  4. Version 1.5.2 of the innotop MySQL monitor released
  5. A little-known way to cause a database deadlock

Version 1.5.1 of the innotop MySQL monitor released

Download innotop

This release is part of the unstable 1.5 branch. Its features will ultimately go into the stable 1.6 branch. You can download it from the innotop-devel package.

The major change is a new Command Summary’ mode (switch to this mode with the ‘C’ key) that’s similar to mytop’s ‘c’ mode. It shows you the relative size of variables from SHOW STATUS and SHOW VARIABLES. Here’s a sample:

Command Summary (? for help) localhost, 25+07:16:43, 2.45 QPS, 3 thd, 5.0.40

_____________________ Command Summary _____________________
Name                    Value    Pct     Last Incr  Pct    
Select_scan             3244858  69.89%          2  100.00%
Select_range            1354177  29.17%          0    0.00%
Select_full_join          39479   0.85%          0    0.00%
Select_full_range_join     4097   0.09%          0    0.00%
Select_range_check            0   0.00%          0    0.00%

The default is to show the Com_* variables, but I’ve used a different prefix to illustrate that you can view any variables you want. You just choose the prefix. Useful ones are Select_, Handler_ and Sort_. This gives you instant insight into the kind of work your server is doing. You can see in the sample above that the kinds of joins the server does is healthily balanced towards scans and ranges on the first table. The server does very few full joins, full range joins, and range-check query plans (this is good).

The example shows one server, as you can see by the first line. Naturally, you can monitor many servers in aggregate, and it’s configured to do this by default if you’re watching more than one server. However, there’s a bug in the percentage columns when you do that (the Value columns are accurate when aggregated). I have a fix in mind for that, which will also fix many other things that cause me (and you) too much work when customizing innotop. But that’ll come later. I feel this is good enough for now, since the main use for this mode is when you’re just trying to familiarize yourself with a server, perhaps at a consulting job, or when reading someone’s tuning tutorial or the like.

Technorati Tags:, , , ,

You might also like:

  1. The innotop MySQL and InnoDB monitor
  2. How to monitor MySQL status and variables with innotop
  3. Version 1.6.0 of the innotop monitor for MySQL released
  4. Version 0.1.106 of innotop MySQL/InnoDB monitor released
  5. Version 0.1.123 of innotop released

innotop 1.5.0 released

Download innotop

Version 1.5.0 of the innotop MySQL and InnoDB monitor is out. This release is the first in the unstable 1.5.0 branch, which will eventually become the stable 1.6 branch. I’m beginning to merge the various branches I’ve made to support some of our needs at my employer. This first release adds some major new features and prepares for some other large improvements and new features.

What’s new

Here’s what’s new:

  • Added plugin functionality.
  • Added group-by functionality.
  • Moved the configuration file to a directory.
  • Enhanced filtering and sorting on pivoted tables.
  • Many small bug fixes.

Plugins

Plugins let you hook custom code into innotop. Your custom Perl module can extend or change innotop without touching its source code, and all you have to do is drop it into a directory and activate it (sound familiar to you WordPress users?). As an example of how this is useful, about two dozen lines of code lets me add “program” and “unix_pid” columns into the Query List and InnoDB Transaction List modes. These show the originating program and PID for connections by querying tables in which this data is stored. The plugin adds the columns and expressions for them, and then adds the data in by using innotop’s own DBI connections.

There’s an example plugin in the documentation.

Grouping

This functionality lets you apply something like a SQL GROUP BY to a table. There are some built-in rules (press the ‘=’ key in Q or T mode; it’s easier if you hide the header with the ‘h’ key first).

The built-in rules let you group connections or transactions by status. They also automagically show a ‘count’ column, which is there but hidden until the grouping is applied. Now you can see how many connections are in what status. Here’s a screenshot of before and after:

innotop ungrouped

innotop grouped

You can toggle this on and off easily with the ‘=’ key on any table. (Most tables don’t have default group-by expressions, though, so you’ll have to read the docs to learn more about that. As with any features, let me know if you have a useful default you want me to include in innotop).

Notes

Don’t be scared by the “unstable” designation. It only means that I’m getting ready for a lot of changes that don’t belong in a stable branch; this release should generally be as good quality as any other. And I don’t want to use a naming scheme like “innotop-6.0-pre-alpha-1_rel5″. When I release a version I don’t think is good quality, I’ll let you know ;-) Generally I’m going to confine that code to the Subversion repository.

As an aside, both this and the MySQL Toolkit project are becoming more popular, and as that happens, I’m also getting busier — among other things, I’m writing a book! I must say SourceForge is great in some ways for helping to manage the project, but a lot of extra work in others. For example, it created a bunch of default forums, trackers, and settings when I created the projects, and that’s been pretty hard to slog through. The documentation system is not useful for my project. I think I’ve finally figured out how to get emails when people submit bug reports. I’m also trying to automate the tedious release process as much as I can, and it’s not proving easy. I don’t mean this to be a litany of woes, because I know they and I are doing our respective bests; it’s more of a commentary on the increased work that comes with a “generic, flexible” system — which is what people always seem to want, until they get it. I’m sure you all know what I mean!

Please go download, use, write plugins, and find and report bugs (via the sourceforge tracker, of course)! And happy innotop-ing.

Technorati Tags:, ,

You might also like:

  1. Version 1.5.2 of the innotop MySQL monitor released
  2. Version 1.6.0 of the innotop monitor for MySQL released
  3. A look at innotop’s new features
  4. Version 1.5.1 of the innotop MySQL monitor released
  5. What I’ve been doing lately

pair Networks is now carbon-neutral

I’m a big fan of pair Networks, my hosting company. Their service has been outstanding; the few times I’ve ever had a glitch with my shared hosting, they have been responsive beyond the call of duty and done whatever it takes to fix the issue. I use them to host not only a half-dozen of my own sites, but family and client sites as well, plus some other groups I’m involved with. It has been a uniformly excellent experience.

Now I see pair Networks has gone carbon-neutral too. While such labels can be abused, and I wouldn’t really trust this announcement from just anyone, I trust them. I’m happy to see them trying to reduce their environmental impact. Go pair, go!

Side note: pair is not the cheapest (and I’m accepting gifts if you feel the urge), but every other hosting provider I’ve heard people rave about for cheapness eventually ends up being a sore point — even the biggest in the industry — I’ll name no names. Sometimes there’s no way to know if someone is good without trying them for six months and seeing how they handle problems. I am also involved with enterprises that use Blue Ridge InternetWorks, who is also top-notch and employs a number of people I respect a lot.

Technorati Tags:, , , , ,

You might also like:

  1. Four companies to sponsor Maatkit development
  2. How to install beautiful X11 cursors

Organizing High Performance MySQL, 2nd Edition

I mentioned earlier that I’d blog about progress on the book as we go. It’s not only progress on the book itself — I want to write about the process of writing, because I think it’s very interesting and relevant to software engineering. I’m finding a lot of the work in writing a book comes from some of the same things that make software hard: coordinating work, deciding what should go where, and so on.

As I mentioned in the last article, this book is going to be much bigger than the first edition. There are places where we’re working from the first edition as a baseline, but they’re really a small part of the book. Sections have become chapters; appendices have become chapters. Topics become sections. Bulleted lists become sections too.

We (as a team) have deep expertise on a pretty broad spectrum of MySQL. Take any point in the first edition — here, I’ll open it randomly and find a page. Okay, that one was about GRANT… maybe I’ll find another one ;-) Page 68, “Index Structures”. This section in the first edition gives a couple of paragraphs to B-Tree indexes. We are probably going to write many pages and have diagrams. Not that you don’t know how B-Tree indexes work, but there are a lot of things to think about: what kinds of queries can you satisfy efficiently with them? What’s the memory cost of a B-Tree index? How can you use them to simulate hash indexes on storage engines that don’t (yet) support hash indexing? What are some useful hacks you can do? What about fragmentation, fill factor, and so on? Inserting in sorted order is a worst-case scenario in one way because it causes the most re-balancing, but does that matter overall? (As it turns out, it doesn’t — page fill factor and fragmentation trump re-balancing cost).

This kind of depth in the material is great, of course. It means you can learn about things you need to hone MySQL for a specific scenario. Though MySQL performs well as a general-purpose database server, a lot of people striving for high performance need to push the server really hard in a specific problem. Think about del.icio.us, for example. Imagine the queries they run. They’re far from general-purpose! Including specific details in such depth is very helpful for people trying to solve specific problems.

But it makes for an interesting and difficult challenge for us as authors: we have to figure out how to organize the material so you can use it. In some ways, it is a classic multiple hierarchy problem. Chapters, sections and subsections are a hierarchy. That’s the way books work, but one hierarchy can never adequately address multi-dimensional data, and MySQL is definitely a multi-dimensional topic.

Let me give you an example: we have chapters on architecture, query optimization, and schema optimization. Each of these topics has storage-engine-specific details. We can place all the details in a section titled “Engine-Specific Notes,” but then where will you go to learn about each storage engine? You’ll have to read every chapter’s notes section. We could stuff it all into a chapter called “Storage Engines,” but that chapter would hardly make sense without discussing a lot of architecture, queries, and schema optimization, would it?

Ultimately this problem is not solvable in a static book, which can only have one hierarchy. If it were a data warehouse, we could give you multiple dimensions and let you drill into the topics any way you please. In a book, the best we can do is try to arrange things where they make the most sense and seem to go with the other material the best, and then give you cross-references and a great index.

This is just one of the interesting challenges in writing that is very reminiscent of good software engineering, where code needs to be massaged into the place where it fits best. Actually, code is easier than this, because in a well-designed system, there’s usually just one best place for some bit of functionality to go. There’s usually no single best place in a book.

Working with multiple authors who have different talents and expertise also reminds me of collaborative programming, but maybe I’ll write about that another time.

Technorati Tags:, , ,

You might also like:

  1. High Performance MySQL, Second Edition: Replication, Scaling and High Availability
  2. Progress on High Performance MySQL, Second Edition
  3. High Performance MySQL, Second Edition: Schema Optimization and Indexing
  4. High Performance MySQL, Second Edition: Backup and Recovery
  5. A review of Pro MySQL (Kruckenberg and Pipes, Apress 2005)