Archive for November, 2007

Progress on Maatkit bounty

My initial plans got waylaid! I didn’t pull out the checksumming code first, because the code wasn’t at all as I remembered it. Instead, I began writing code to handle the more abstract problem of accepting two sets of rows, finding the differences, and doing something with them. I’m ending up with a little more complicated system than I thought I would. However, it’s also significantly simpler in some ways. Instead of just passing references to subroutines to use as callbacks, I’m object-ifying the entire synchronization concept.

What’s the advantage of doing this? Well, as some of you may know, there are two fairly complex algorithms in the tool at present, which handle synchronization in a hierarchical manner, zooming in on the rows that need to be changed. There are a lot of complexities in them. If I wrap all that up into modules, and make them have a uniform interface (real OO interfaces would be delightful here, but Perl doesn’t support them), I can simplify the project significantly by…

…throwing them out the window! That’s right, I’m tossing out the ‘top-down’ and ‘bottom-up’ algorithms. What I want to develop, first and foremost, is the code that does the synchronization, not the really twisted code that does bitwise XORs on groupwise slices of checksums and has recursion and all that stuff. So I decided on a generic data-syncing interface, and wrote the simplest possible implementation of that, which I’m going to use to help me deal with complexity. This algorithm is called ’stream’ (for lack of a better word). It has no hierarchical drill-down or any other complexities. It amounts to “select * from source, select * from dest, diff and resolve.”

It’s not a very efficient algorithm for comparing and syncing data, at least not by my standards. (It amounts to a FULL OUTER JOIN implemented in Perl). But boy, does it make it easier to start cleaning up the nasty spaghetti code that handles locking, waiting for a slave to catch up, actually changing the data that turns out to be different, and so on.

Of course, I’ll add back the top-down and bottom-up algorithms later, as well as some others. They should turn out to be pretty simple to implement, since they won’t have, for example, locking code intertwined with them. When done, the tool will examine the table and figure out the best algorithm to use. This will go a good way towards another of my goals, which is that you should be able to just point it at two tables and tell it to sync them, and it should do it in the most efficient way possible, without needing lots of command-line options.

Technorati Tags:, , ,

You might also like:

  1. Progress on Maatkit bounty, part 2
  2. Maatkit bounty begins tomorrow
  3. Introducing MySQL Table Sync
  4. A progress report on MySQL Table Sync
  5. Comparison of table sync algorithms

Maatkit bounty begins tomorrow

Tomorrow is the first of five days I will spend working on mk-table-sync, the data synchronization tool I developed as part of Maatkit. The first thing I’ll do is pull the row-checksumming code out into a module and write a unit test suite for it. I’ll probably add the code to the module that does checksums for mk-table-checksum, since it is not all that different.

My mind is not fresh on the code, but I think the next thing after that will be to pull out the code that finds differences in two sets of rows. It is largely identical for the two algorithms (which I called top-down and bottom-up for lack of better ideas). My plan is to use a callback function to abstract away the functionality that’s not the same. In other words, I’ll write code that accepts two sets of rows and a reference to a subroutine, and when it finds a difference between the rows it will call the subroutine.

This is a bit speculative, but the next step after that is probably to write modules for the top-down and bottom-up code too.

Then the rest of the program becomes “glue” for these tested modules. A lot of the functionality is already in modules I built for other tools, such as the code that parses a table definition, finds an optimal index, etc. I’m not sure how much of the code I’ve already written (and tested) will be able to replace parts of the current non-modular script, but I think it’ll be a lot. And I’ll just have to see what’s left over and how much of that fits into yet more modules. With yet more test suites.

The features I’m planning to implement, as well as the bugs I’m planning to fix, are all in the bug tracker at Sourceforge.

Technorati Tags:, ,

You might also like:

  1. Progress on Maatkit bounty
  2. Maatkit version 1297 released
  3. Progress on Maatkit bounty, part 2
  4. Duplicate index checker improved
  5. Growth limits of open-source vis-a-vis MySQL Toolkit

Progress on High Performance MySQL, Second Edition

It’s been a while since I said anything about the progress on the book. That doesn’t mean we are not still working on it, though.

As Peter wrote a while ago, he is basically wearing the hat of a very advanced technical reviewer at this point. We’ve finished writing all the chapters from his detailed outlines. He has worked through about half the chapters, and I’m continuing to spend my evenings and weekends and holidays (yes, nearly all my free time — just ask my wife!) writing some new material (an appendix on EXPLAIN, for example), finishing unfinished things marked with TODO in the text, and revising chapters after Peter reviews them. Vadim is working on benchmarks. For example, he just finished some benchmarks for something I profiled with SHOW STATUS. I thought that would be good enough to assert something about the performance. Sure enough, SHOW STATUS says it does less work, but Vadim’s benchmarks show it’s slower :-) This is why we check each other’s work!

The core chapters on MySQL performance — beginning with Benchmarking and Profiling, and continuing through Optimizing Server Settings — are the ones Andy Oram, our editor, thinks we should put the most effort into, and I agree. We will probably circle back and go through another review/edit cycle before we release them for technical review. Some of the other chapters, such as Replication, are already out for technical review.

Despite the fact that all of the chapters and appendixes are theoretically a “first draft,” as of several weeks ago, there is still a lot of work to do. Depending on the chapter, it takes me a solid weekend to revise a chapter after Peter reviews it. Each little thing anyone points out (does MySQL version X really do Y by default?) requires some research, testing, benchmarks, or even reading the source code.

Some miscellanea:

  • The production staff replied to my inquiry to the editor to say that yes, we will be able to have references that point to a specific page number. This was a big relief to me. It requires extra work, but makes the book so much more valuable as a reference work in my opinion. To see why, look at the top of page 151 in the first section, which just refers to chapters and sections by their titles: “See… the “Tools” section…” Now try to find the “Tools” section. If it took you a while… well, the first time I did it, I missed it, and thought it might mean the Tools Chapter. The second edition will say “The X section on page Y” or similar. (Okay, I’ll shut up about this now — everyone has to have a pet peeve, eh?)
  • We are currently at 425 pages in OpenOffice.org Writer, which by my calculations puts us around 470 pages in print. As I said before, I think we’ll break 500 pages by the time we finish the rest of the missing material.
  • Andrew Aksyonoff has contributed an appendix on the Sphinx full-text search system. If you don’t know anything about it, check it out. It’s an amazing piece of software that does a lot more than just full-text search.

Well, I’ve run out of my allotted thirty minutes of blogging! Back to the salt mines! Just kidding… I’m actually off to the climbing gym soon to get my mind off it.

Technorati Tags:, , , , , ,

You might also like:

  1. Progress report on High Performance MySQL, Second Edition
  2. High Performance MySQL, Second Edition: Backup and Recovery
  3. More progress on High Performance MySQL, Second Edition
  4. Progress on High Performance MySQL Backup and Recovery chapter
  5. Organizing High Performance MySQL, 2nd Edition

Favorite USB wireless card for Ubuntu?

Dear LazyWeb,

Do you have a favorite USB wifi network card for a laptop running Ubuntu?

Sincerely, Xaprb

Technorati Tags:, ,

You might also like:

  1. Ubuntu on Dell Inspiron 1501
  2. Why I (still) like Gentoo
  3. Firefox vs. Opera on slow hardware
  4. How to set up Gentoo wireless networking on AMD64
  5. Credit card expiration dates should conform to standards

Maatkit version 1314 released

Download Maatkit

Maatkit (formerly MySQL Toolkit) contains essential command-line utilities for MySQL, such as a table checksum tool and query profiler. It provides missing features such as checking slaves for data consistency, with emphasis on quality and scriptability.

This release fixes several minor bugs. It also renames all the tools to avoid trademark violation, completing the project rename. (Let me know if I missed anything.)

Changelog for mk-find:

2007-11-25: version 0.9.7

   * Added --sid option.

Changelog for mk-show-grants:

2007-11-25: version 1.0.5

  * --askpass ignored the entered password (bug #1838131).

Changelog for mk-table-checksum:

2007-11-25: version 1.1.20

   * --replcheck didn't recurse; it should recurse one level (to slaves).
Technorati Tags:, ,

You might also like:

  1. Maatkit version 1753 released
  2. Maatkit version 1877 released
  3. Maatkit version 1674 released
  4. Maatkit version 1709 released
  5. MySQL Toolkit version 675 released

Four companies to sponsor Maatkit development

A while ago I asked for people and/or organizations to sponsor development on Maatkit (formerly MySQL Toolkit) so I could take a week off work and improve the Table Sync tool. I asked for $2500 USD, but several companies have graciously offered to cover that and then some.

I’m very happy about this, as it will allow me to dedicate a solid week to fixing bugs and adding features. There’s a lot of demand for the tools, and there are a dozen or so bug reports unresolved for the table-sync tool, which I personally want to fix as much as anyone. So I’m very grateful for the support.

Here are the companies who have promised their financial support:

MySQL AB

MySQL AB

MySQL AB have offered $3000 USD in support. I had an email conversation with Mårten Mickos, MySQL’s CEO, and he expressed his happiness about the project’s success, and his pleasure in supporting the project:

We have seen you operate in the community and you always have constructive and good ideas. That’s why we want to support you. Our goal with this is to stimulate innovation in the MySQL ecosystem.

I don’t know how the idea to support the project started at MySQL AB, but that quote really tells me “we get it: we have a symbiotic relationship with our community of users.” In a follow-up email, Jay Pipes wrote,

… MySQL wants to make it clear that we very much support and appreciate the work you’ve done on the toolkit. It has proven to be one of, if not the, most popular and successful open source ecosystem projects surrounding MySQL and for good reason. So, for your work and commitment to the project, a big thank you from MySQL. :)

Secondly, we would like to encourage you to be open and public about our support of you. The community team is always looking for opportunities such as the one which presented itself with your toolkit, and we want the outside community to know about our support and encouragement. Therefore, you have our blessing and encouragement to blog about the sponsorship of your development work. Please do let us know if and when you decide to blog about it. Remember also that this sponsorship is no strings attached. There is no expectation of specific work on our end.

Blue Ridge Internetworks

Blue Ridge Internetworks

Blue Ridge Internetworks have offered $1000 USD in support. BRIworks, as they’re known locally, is headquartered in the town where I live, Charlottesville, Virginia. They offer networking consulting and services. Jeff Cornejo, who offered the support to me, is a friend and used to work where I used to work, and several other highly respected friends and ex-co-workers work at BRIworks too. BRIworks provides Internet service and hosting for my employer.

Percona

Percona

Percona have offered $500 USD in support. Percona does high-performance website consulting, and are perhaps best known for having some of the world’s top MySQL experts, including Peter Zaitsev and Vadim Tkachenko, two of the co-authors on High Performance MySQL, second edition.

The Rimm-Kaufman Group

Rimm-Kaufman Group

Last, but absolutely not least, my employer, The Rimm-Kaufman Group, who do paid search marketing and website effectiveness consulting. They have let me spend a significant amount of time writing these tools for use on our own systems, and instead of keeping them in our own Subversion repository, allowed the code to be released as Free Software. The time I’ve spent on the tools has gone well above and beyond what we needed to get our work done. Finally, RKG has blessed my unpaid week off to work on the tools.

A big thanks is due to all of these companies and individuals, as well as other people who have contributed financially and otherwise.

Closing thoughts

I’m grateful for the sponsorship, but I think the real winners are the MySQL community, who have benefited a lot from Maatkit. It has made a lot of hard things easier and impossible things possible. If you’re one of those who benefits from Free Software, I encourage you to patronize the businesses that believe in and support it. Four fine examples are listed above! Not coincidentally, all of them are the creme de la creme in their respective fields.

Finally, a quick journalistic note: I pre-approved this post with representatives from the companies I mentioned, because I respect their right to represent themselves as they wish, but the words are mine, not theirs.

Technorati Tags:, , , , , , , ,

You might also like:

  1. I have joined Percona
  2. Proposed bounty on MySQL Table Sync features
  3. Maatkit bounty begins tomorrow
  4. Maatkit t-shirts are here
  5. Coming soon: High Performance MySQL, Second Edition

Why is Embarq hijacking my DNS?

Isn’t this the same thing that happened a few years ago with ICANN or Verisign or one of those big names? (strangely, I can’t find relevant search results about this!).

I clicked on my toolbar shortcut for Toggl and my Embarq DSL service redirected me to a search-results page instead of telling my browser the truth. This makes me mad. The core layers of the Internet are designed the way they are for a reason and I don’t want to “opt out” of a stupid DNS hijacking stunt I never opted into.

Here’s a screenshot of what happens when I type in any old non-existent (or, in Toggl’s case, timing-out) domain name.

Embarq screwing with my DNS

And here’s what happens when I do a DNS lookup:

baron@kanga:~$ dig www.toggl.com

; <<>> DiG 9.4.1-P1 <<>> www.toggl.com
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27795
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.toggl.com.                 IN      A

;; ANSWER SECTION:
www.toggl.com.          22      IN      A       66.199.249.106

;; Query time: 72 msec
;; SERVER: 208.33.159.39#53(208.33.159.39)
;; WHEN: Fri Nov 23 15:50:14 2007
;; MSG SIZE  rcvd: 47

baron@kanga:~$ ping www.toggl.com
PING www.toggl.com (66.199.249.106) 56(84) bytes of data.
64 bytes from 66-199-249-106.reverse.ezzi.net (66.199.249.106): icmp_seq=1 ttl=53 time=79.2 ms

Did I mention that this makes me mad? Time to get on the phone.

PS: it looks like Verizon is doing it too.

Technorati Tags:, , , ,

No related posts.

Maatkit version 1297 released

Download Maatkit

Maatkit (formerly MySQL Toolkit) version 1297 contains a significant update to MySQL Table Checksum (which will be renamed soon to avoid trademark violations). The changelog follows. What you don’t see in the changelog is the unit test suite! I got a lot more of the code into modules that are tested and re-usable.

2007-11-18: version 1.1.19 

* Check for needed privileges on --replicate table before beginning. 
* Made some error messages more informative. 
* Fixed child process exit status with 8-bit right-shift. 
* Improved checksumming code auto-detects best algorithm and function. 
* Added --ignoreengine option; ignores federated and merge by default. 
* Added --columns and --checksum options. 
* Removed --chunkcol, --chunksize-exact, --index options. 
* --chunksize can be specified as a data size now. 
* Improved chunking algorithm handles more cases and uses fewer chunks. 
* Do not print --replcheck results for servers that are not slaves. 
* Create only one DB connection for each host, not one per host/tbl/chunk. 
* Code assumed backtick quoting, broke on SQL_MODE=ANSI (bug #1813030). 
* There were many potential bugs with database and table name quoting. 
* Child exit status errors could be masked by subsequent successes.
Technorati Tags:, ,

You might also like:

  1. Maatkit version 1753 released
  2. Maatkit version 1674 released
  3. Maatkit version 1877 released
  4. Maatkit version 1579 released
  5. Maatkit version 1314 released

New Maatkit release policy

Download Maatkit

Maatkit (formerly MySQL Toolkit) has for some time been released both as a bundle, and as individual tools. It’s too much work to maintain the individual packages, and I don’t think it really benefits anyone much, if at all. While the tools will still be versioned separately, I’m going to discontinue releases of the individual packages, and just release the one uber-package from now on.

This will also make it easier for me to manage the name change, but that’s just an extra incentive; I’ve been considering this for a while.

By the way, Sourceforge indicated it would take up to a couple of days to finish the project’s rename, but it took only a few minutes. Lots of broken links; I’ve asked for a permanent redirect from the old URLs to the new.

Technorati Tags:, , ,

You might also like:

  1. Maatkit version 1314 released
  2. I need your advice on how to package MySQL Toolkit as one file
  3. Maatkit on Ohloh
  4. Maatkit in RHEL and CentOS
  5. A very fast FNV hash function for MySQL

MySQL Toolkit is now Maatkit

I am so lucky I married an archaeologist.

Choosing a new name for MySQL Toolkit has been a hassle. I wanted to avoid a literal name, such as, um, MySQL Toolkit. Short is good. And so on, and so on. All the while, the Phoenix/Firebird/Firefox naming debacle was in my thoughts. I only want to do this once.

At first I tried not to stray too much from the current name. MyToolkit, eh, it’s okay, but a) it’s taken and b) it reminds me of Microsoft Windows, where everything is “my.” My Documents, My Pirated Music, My… you get the idea. I tossed out various combinations of Xaprb and Toolkit. Xaprb is unique, and it’s not completely unknown anymore, but it’s not that great a name. (For those who don’t know, it’s a total geek-out thing. It’s what you get when you type my first name on a keyboard that’s been remapped to Dvorak.) XAToolkit seems cool at first, but is this thing really about XA transactions? … No.

I tried to think of some mythical figure, such as an Egyptian god(dess). Oooh! Thoth is the god of writing, the scribe, record-keeping, etc… that’s related to databases, right? But it’s actually pronounced ‘toe-th’ so no one would ever find it, and I’d have to correct people at conferences and such (ack!). And anyway, that’s an appropriate name for a database, not a set of tools for augmenting a database. (Seshat got eliminated for the same reasons, though she’s even cooler than Thoth).

On the topic of mythical figures, Sargon, Hammurabi, Ashurbanipal, and Gilgamesh are all wicked cool (and they’re not all mythical), but not good names for the toolkit. (Neither is Engleburt Humperdink, but that’s another blog post).

What to do?

Ask my wife, of course. She is a Near East Archaeologist, among her many areas of expertise. She’s wonderfully clever. I must say, she was initially too clever for the task. She wrote me an email suggesting “IT Toolk (get it?)” I did not get it, and she didn’t reply to my “I give up” email for a while, so I was left to agonize over what I was missing. Is it a name of an Assyrian scribe? A word in some language only she can read…? No. It’s “Toolkit” with the last two letters placed first. That was anti-climactic. But when I started picking her brain, she immediately thought of Ma’at.

Ma’at is not only an Egyptian word, she’s a goddess. (We need more women in this profession!) She is the patron of truth, harmony, and order. She restores things to their proper balance and place. Without her, everything would return to chaos. She wears an ostrich feather, and the heart of a deceased person has to be weighed against the feather when passing to the underworld.

In fact, ma’at isn’t just a name and a word, it’s a concept, as my wife explained to me. This concept doesn’t have an exact parallel in other languages and cultures. You should read about it via the link I just gave — it’s really quite a fascinating bit of Egyptology. I asked my wife to find a good image of weighing the heart of the deceased, and she took some time to describe the scene:

Weighing of the Heart

The deceased is in white. He is visible along the top of the image, in front of a dozen or so judges. He’s visible again in the main part of the image. Anubis is leading him by the hand into the presence of Osiris. Osiris is not the figure kneeling under the scale — that’s Anubis again.

The scale of judgment has the deceased’s heart on the left side, in a jar, and the feather of justice (ma’at) on the right side. The Egyptians believed that the heart, not the brain, was the source of one’s personality and identity. The goddess Ma’at is on top of the middle of the scale, supervising the proceedings. She has the feather of justice on her head. Thoth, the god of writing (a personified Ibis), takes down the judgment. If the deceased passes judgment, he will continue to the underworld; otherwise he will be eaten by the “devourer,” who is part lion, part alligator, and sits under the right-hand arm of the scale.

In this scene the judgment was positive and Horus (the son of Osiris — a hawk) leads the deceased towards the canopy where Osiris awaits. Osiris is flanked by two goddesses. One is Isis; Nephthys is probably the other.

I don’t know about you, but I think this is all very interesting; maybe I should become an Egyptologist.

But best of all, ma’at applies to the tools I’ve written, too! Without them, your replication gets out of sync, and you don’t even know it. Fortunately, the tools let you bring things back to the way they should be, restoring order to your universe of data. And so on.

In the end, maybe Maatkit is not the greatest name for various reasons (Ma’at was already taken on Sourceforge, by the way), but it’s so freakin’ cool that I can’t pass it up. You can’t find a perfect name, anyway; if it’s good by one metric, it’s bad by another. Maatkit it is.

I’m going to be changing the toolkit’s name on Sourceforge quite soon. There’s also some other interesting stuff going on, which I’ll write about separately.

(Ma’at is pronounced “mott,” by the way.)

Technorati Tags:, , , , , , , ,

You might also like:

  1. New Maatkit release policy
  2. Maatkit version 1314 released