Archive for the 'Maatkit' Category

Maatkit in RHEL and CentOS

Update: Karanbir says “Just one thing to keep in mind is that we dont want too many people using it from the Testing repository - we only need enough feedback to move it from testing to stable ( and to be honest, there are already 8 people who have said yes it works - so move to stable should happen within the next 24 - 48 hrs ). Once the package is in stable, users on CentOS4 and 5 wont need to do anything more than just ‘yum install maatkit’ and it will install for them.”

At least one person (Karanbir Singh) is working to get Maatkit into the CentOS repositories, and I believe there might be movement towards RHEL also. From an email to the Maatkit discussion list a little while ago,

I am in the process of getting maatkit into the CentOS-Extras repositories. The first step for that is that every package needs to go into a CentOS-Testing repo and feedback is required from the project and users on its stability / usability and packaging quality.

maatkit-1887 is now available in the CentOS-Testing[*] repo’s and as soon as we can get some feedback ( needs to be 5 different people, none of whom can be CentOS Developers ) - the packages will move into the main repository so that all users can get access.

I’d appreciate it if people on this were able to give those packages a go and let me know if there are any issues. You can leave feedback :

  • via the maatkit-discuss mailing list (http://sourceforge.net/mailarchive/forum.php?forum_name=maatkit-discuss)
  • on the centos-devel list ( http://lists.centos.org ) or
  • http://bugs.centos.org/ against category ‘maatkit’

[*] : Info about the Testing repo and howto set it up on your machine : http://wiki.centos.org/Repositories

If you’re interested in getting Maatkit into these repositories, please take a moment and give the requested feedback. I can’t do it because it would be a conflict of interest for the main developer to assert that the code is stable and usable.

Technorati Tags:, ,

You might also like:

  1. innotop is available from openSUSE buildservice

MySQL Conference and Expo 2008, Day One

Today is the first day at the conference (aside from the tutorials, which were yesterday). Here’s what I went to:

New Subquery Optimizations in 6.0

By Sergey Petrunia. This was a similar session to one I went to last year. MySQL has a few cases where subqueries are badly optimized, and this session went into the details of how this is being addressed in MySQL 6.0. There are several new optimization techniques for all types of subqueries, such as inside-out subqueries, materialization, and converting to joins. The optimizations apply to scalar subqueries and subqueries in the FROM clause. Performance results are very good, depending on which data you choose to illustrate. The overall point is that the worst-case subquery nastiness should be resolved. I’m speaking of WHERE NOT IN(SELECT…) and friends. It remains to be seen how this shakes out as 6.0 matures, and what edge cases will pop up.

The Lost Art Of the Self Join

This was just great. Among many other things, Beat Vontobel showed how a Su Doku can be solved entirely with declarative queries: a very large self-join query against a table of digits and a table of the board’s initial state. I had been promoting this session because last year’s was so very good. I can’t wait to see what he comes up with for next year. Can he find another creative idea? Time will tell.

He wasn’t able to solve a 9×9 puzzle with MySQL because of the limitation on the number of joins, but PostgreSQL had no trouble doing it.

EXPLAIN Demystified

This was my session, of course. (Slides will be on the O’Reilly conference site, if they aren’t already). It went great, I thought. The room was full and people were standing in the back of the room and in the door. The questions came fast and furious; all really good questions. I think we ended up exploring a lot of the MySQL query execution method, strengths, and weaknesses by the time we were through. And I gave away all the remaining Maatkit t-shirts. Hopefully the people who took them will wear them tomorrow and the conference will be sea of deep, rich red shirts.

Someone did an audio recording of the session, but I don’t recall who it was.

Investigating InnoDB Scalability Limits

This session was given by Peter Zaitsev (disclosure: I now work for Percona, the company he co-founded). Peter and Vadim Tkachenko spent a lot of time over the last weeks and months running a dizzying array of benchmarks on MySQL 5.0.22, 5.0.51, and 5.1.24 (if I recall the versions correctly). They were able to show InnoDB’s scaling patterns for a number of different micro-benchmarks on a variety of configurations. If you didn’t attend, please look up the slides if you care about InnoDB performance. A lot of work went into the benchmarks — a lot of work. The slides should be on the conference website or on our blog, http://www.mysqlperformanceblog.com/.

Replication Tricks and Tips

Lars Thalmann and Mats Kindahl gave this session. At a high level, I’d say it was a run-down of all the different ways you can use MySQL replication. Replication is really a flexible tool, and they covered a large array of the most important ways you can use it to achieve different purposes. Many of the techniques they mentioned are implemented by various tools in Maatkit. A couple of the others are implemented in MySQL Master Master Manager and MySQL Semi Multi-Master tools. Don’t re-code these! You can save weeks of work and get quality code by using the pre-built tools. (I built Maatkit, so I know exactly how tricky it is to get some of these things right.)

BoF Sessions

I dropped in on a few BoF sessions, including the Sphinx one and the PBXT/Blob Streaming one. (Keep an eye on the PrimeBase folks — they are up to great things.) Ronald Bradford protected me from those who wanted to get me drunk. Hint: it’s really easy… I have to say, though, Monty’s black vodka was amazing.

Speaking of Blob Streaming, Paul McCullagh and I were talking earlier in the day about the project’s name, MyBS. This has been smirked about a few times. I think it’s a great name, because after all my initials are BS (I usually insert one of my four middle names in to alleviate this problem, but I digress). The conversation went like this:

Me: I like it. My initials are BS.

Paul: BS actually means British Standard, so it can’t be bad.

Me: Better than American Standard. That’s a toilet.

We also debated the merits of watching the original move The Blob. It’s a classic. It must be good.

Technorati Tags:, , , , , , , , , , , , , , ,

You might also like:

  1. Like it or not, it is the MySQL Conference and Expo
  2. Speed up your MySQL replication slaves
  3. MySQL Conference and Expo 2008, Day Three
  4. Baron Schwartz on a podcast at MySQL Conference and Expo 2008
  5. MySQL Conference and Expo 2008, Day Two

MySQL Community Member of the Year

MySQL just gave me an award at this morning’s keynote, along with Sheeri Kritzer Cabral (for the second year in a row!) and Diego Medina, for my code contributions to the MySQL community, specifically Maatkit, which makes it easier to make MySQL reliable, fast, and robust. It’s an honor to be recognized. And while I could leave it at that, I’d like to say a word or two more.

The economy, community, and ecosystem that’s building around Free Software can often be very rewarding financially. This is a great motivation; being rewarded for your efforts is one of the chief virtues of a culture of entrepreneurship, along with the idea that to try and fail is just as noble as to succeed. But I find that isn’t enough. If I were only rewarded financially and with recognitions such as this morning’s, I would quickly become bankrupt at a deeper level. I would become focused on external measures of success, such as accolade and wealth.

That’s why it’s so important to be of service to others and to work for the good of all. This is one of the strongest counterbalances for me. It helps keep me humbler and more open.

In the end, Free Software is all about this. It reminds me always that we are all interconnected, and that to work for your highest good is to work for my own.

I believe we all need at least these three things deeply:

  • To chart your own course in life.
  • To be of service to others.
  • To make the most of what you have.

Does proprietary software offer you the chance to do this? No, it does not. It makes you beholden and dependent, not free. To pursue these three goals to their maximum extent you need freedom. “Make the most of what you have” doesn’t imply that you have to just accept what’s given to you; you can also take some time to see what your choices are, and choose something that gives you more freedom if possible. That’s what I did years ago when I moved away from using proprietary software.

I hope you’ll give this a try yourself: contribute what you build internally in your company, and put in the extra effort to make it really high quality and useful for everyone. This is how Maatkit started. Don’t wait for others to make it happen: chart your own course.

This morning’s award is most important to me because it reinforces that I’m serving others well.

Technorati Tags:,

You might also like:

  1. The truth about MySQL Community and Enterprise
  2. Announcement: Xaprb scripts are re-licensed
  3. Maatkit t-shirts are here
  4. Four companies to sponsor Maatkit development
  5. MySQL: Free Software but not Open Source

Maatkit t-shirts are here

Maatkit

I’m at the MySQL Conference and the t-shirts I created for Maatkit have arrived. Come get yours! They are high-quality, attractive shirts you’ll be proud to wear, and they are a nice rich wine-red color.

Harrison Fisk (co-author of MySQL Clustering) got the first one, because he told me that he recommends Maatkit to MySQL Support customers about twice a week. I made sure to save one for Jay Pipes too, because his luggage got lost so he has nothing to wear. Unfortunately, I didn’t make any Maatkit underwear, sorry Jay. Now I know what to do for next year…

I’ve already given out a whole bunch of them, and at this rate all 50 of them might not last the day!

Technorati Tags:, , ,

You might also like:

  1. MySQL Conference and Expo 2008, Day One

I have joined Percona

Effective April 1, I will join Percona full-time as a consultant. I’ll be helping people build high-performance applications with MySQL, but I’ll also be continuing to develop and improve tools such as Maatkit.

This career change has been a long time in progress. I’m really looking forward to it, but at the same time it’s hard to leave my current employer, The Rimm-Kaufman Group (RKG). Working with them has been the best job I’ve ever had. But ultimately, my dream job is to help as many people as I can, and consulting will be a better way to do that.

At a time like this, I like to reflect on the trail that has led here. It’s a good opportunity to realize how fortunate I really am and fill up my gratitude tank. So I’d like to thank everyone who has helped me reach this point. All the people who have encouraged me, sponsored me, suggested new options… all kinds of help. A special thanks to my wife Lynn, to Alan Rimm-Kaufman and all my colleagues at RKG, to the many fine people at MySQL, and to the MySQL community as a whole. My deep gratitude to all of you. I look forward to working with you even more in the future.

Technorati Tags:, , ,

You might also like:

  1. Four companies to sponsor Maatkit development
  2. innotop 1.4.0 released

Maatkit version 1877 released

Download Maatkit

Maatkit contains essential command-line utilities for MySQL, such as a table checksum tool and query profiler. It provides missing features such as checking slaves for data consistency, with emphasis on quality and scriptability.

This release contains major bug fixes and new features. Some of the changes are not backwards-compatible. It also contains new tools to help you discover replication slaves and move them around the replication hierarchy.

Changelog for mk-archiver:

2008-03-16: version 1.0.8

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).
   * Changed short form of --analyze to -Z to avoid conflict with --charset.

Changelog for mk-deadlock-logger:

2008-03-16: version 1.0.9

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added 'A' part to DSNs (bug #1877548).

Changelog for mk-duplicate-key-checker:

2008-03-16: version 1.1.5

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).

Changelog for mk-find:

2008-03-16: version 0.9.10

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).

Changelog for mk-heartbeat:

2008-03-16: version 1.0.8

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).

Changelog for mk-parallel-dump:

2008-03-16: version 1.0.7

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).
   * A global database connection was re-used by children, causing a hang.

Changelog for mk-parallel-restore:

2008-03-16: version 1.0.6

   * Added --setvars option (bug #1904689, bug #1911371).
   * Changed --charset to be compatible with other tools (bug #1877548).

Changelog for mk-query-profiler:

2008-03-16: version 1.1.9

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).

Changelog for mk-show-grants:

2008-03-16: version 1.0.9

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).

Changelog for mk-slave-delay:

2008-03-16: version 1.0.6

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added 'A' part to DSNs (bug #1877548).

Changelog for mk-slave-find:

2008-03-16: version 1.0.0

   * Initial release.

Changelog for mk-slave-move:

2008-03-16: version 0.9.0

   * Initial release.

Changelog for mk-slave-prefetch:

2008-03-16: version 1.0.1

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).

Changelog for mk-slave-restart:

2008-03-16: version 1.0.6

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).
   * Added logic to repair tables, and rewrote a lot of code.
   * Added --always option, disabled by default.  Not backwards compatible.
   * --daemonize did not work.
   * --quiet caused an undefined variable error.

Changelog for mk-table-checksum:

2008-03-16: version 1.1.26

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added 'A' part to DSNs (bug #1877548).
   * Added --unique option to mk-checksum-filter.
   * The exit status from mk-checksum-filter was always 0.
   * mk-table-checksum now prefers to discover slaves via SHOW PROCESSLIST.

Changelog for mk-table-sync:

2008-03-16: version 1.0.6

   * --chunksize was not being converted to rowcount (bug #1902341).
   * Added --setvars option (bug #1904689, bug #1911371).
   * Deprecated the --utf8 option in favor of the A part in DSNs.
   * Mixed-case identifiers caused case-sensitivity issues (bug #1910276).
   * Prefer SHOW PROCESSLIST when looking for slaves of a server.

Changelog for mk-visual-explain:

2008-03-16: version 1.0.7

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).
Technorati Tags:, ,

You might also like:

  1. Maatkit version 1314 released
  2. Maatkit version 1709 released
  3. Maatkit version 1674 released
  4. Maatkit version 1753 released
  5. Maatkit version 1508 released

A very fast FNV hash function for MySQL

I wrote a User-Defined Function that implements the FNV (Fowler-Voll-No) hash function for MySQL. I’m not the first person to do this — in fact, I was inspired by the Google patches for MySQL. But my implementation is a little bit different from most, in a very important way that leads directly to much higher performance, especially suited for the Maatkit tools.

A bit of background: FNV hashing is a very fast hash algorithm that operates in fixed memory. It is widely used in lots of important areas in computer science. My implementation requires absolutely no malloc() calls, which is a darn good thing because I am not to be trusted with malloc(), having spent too many years programming in managed languages. I made it return a 64-bit integer, which matches the size MySQL uses internally for most integer arithmetic.

The most important thing I did was make my UDF accept 1 to infinity arguments. That means you can hash entire rows with a single function call. And that is very useful for the Maatkit table-checksumming tools, which tend to run about 8-10 times faster when they don’t have to make MySQL do a bunch of string concatenation. That translates directly to less impact on the server, and less slave lag (if that is a problem for you).

Here’s how my implementation works:

SELECT FNV_64(col1, col2, col3, .... colN) FROM ...

Compare this to MD5() hashing that accomplishes the same thing:

SELECT MD5(CONCAT_WS('#', col1, col2, col3, .... colN)) FROM ...

The UDF’s code is distributed with Maatkit, and I plan to eventually build it as a binary that can be installed without requiring you to compile it. However, compiling is very easy; there are instructions in the source code comments. Installing is also easy: just a simple SQL statement.

If you’re using Maatkit to make sure your slaves have the same data as their master, you should install the UDF on all your servers for a significant performance boost. You’ll save your servers a lot of work. You don’t need to do anything extra for Maatkit to take advantage of it. Maatkit will auto-detect it if you have it installed.

I’ve been running it in production for a couple of months now with nothing but good results. And the code is drop-dead simple, so I think the chance of bugs is very slim. But if you have questions, problems, bug reports etc, please use the Maatkit project page to report them.

Technorati Tags:, , , ,

You might also like:

  1. Speed up your MySQL replication slaves

How to sync tables in master-master MySQL replication

Suppose you have a master-master replication setup, and you know one of the tables has the wrong data. How do you re-sync it with the other server?

Warning: don’t just use any tool for this job! You may destroy your good copy of the data.

If your table is large, you’ll probably want to use a tool that can smartly find the differences in a very large dataset, and fix only the rows that need to be fixed. There are several tools that are either able to do this, or claim to be able to do this. However, most of them are not replication-aware, and are likely to either break replication or destroy data.

To see why this is, let’s look at a typical scenario. You have server1 and server2 set up as co-masters. On server1, your copy of sakila.film has correct data. On server2, somehow you are missing a row in that table. A hypothetical sync tool will compare the two copies of the data and find the missing row, then insert it on server2. This INSERT statement will flow through replication to server1, where it will cause a duplicate key error and stop replication.

You can probably think of many other scenarios with lots of bad side effects, so I won’t list any more. I’ll leave it at this: when you are synchronizing data on a slave (even if it is also a master), you must not change data on the slave. Changing data on the slave can cause so much trouble in so many different ways! The correct way to do this is to make the changes on the master, and let them flow through replication to the slave.

As far as I know, there is only one tool that is capable of doing this. It is mk-table-sync, which is part of Maatkit. However, even this tool will let you point the gun at your foot and pull the trigger, if you don’t use it correctly.

The correct way to sync a master-master setup with mk-table-sync is with the --synctomaster option, which tells it to make changes on the master:

mk-table-sync --synctomaster h=server2,D=sakila,t=film

Notice that I’m connecting to the slave, but instructing it to make changes on the master. (Yes, it is able to find the master by inspecting the slave).

If you do the following, you’ll probably cause problems:

mk-table-sync h=server1,D=sakila,t=film h=server2

I’ve just updated the documentation to point out the subtleties with master-master replication. However, you should always keep in mind: it’s not just master-master replication. Any replication configuration is best synchronized by making the changes on the master, and you should always avoid changing data on a slave — even to “fix” the slave. I might also add a feature to mk-table-sync to warn you when it detects that you are trying to change data on a slave.

Technorati Tags:, , ,

You might also like:

  1. Progress on Maatkit bounty, part 3
  2. Maatkit version 1579 released
  3. How to know if a MySQL slave is identical to its master
  4. How to avoid an extra index scan in MySQL
  5. Why MySQL says the server is not configured as a slave

Maatkit version 1753 released

Download Maatkit

This release contains minor bug fixes and new features. Besides the little bug fixes, there’s a fun new feature in mk-heartbeat: it can auto-discover slaves recursively, and show the replication delay on all of them, to wit:

baron@keywest ~ $ mk-heartbeat --check --host master -D rkdb --recurse 10
master 0
slave1 1
slave2 1
slave3 4

(Not actual results. Your mileage may vary. Closed course, professional driver. Do not attempt).

Nothing else in this release is very exciting. I just wanted to get the bug fixes out there.

Changelog for mk-heartbeat:

2008-02-10: version 1.0.7

   * Added --recurse option to check slaves to any depth.
   * Made mk-heartbeat explicitly close DB connection when done.

Changelog for mk-parallel-dump:

2008-02-10: version 1.0.6

   * Added the --losslessfp option.
   * Fixed child process exit status on Solaris (bug #1886444).

Changelog for mk-parallel-restore:

2008-02-10: version 1.0.5

   * Fixed forking issues with File::Find on Solaris (bug #1887102).
   * Fixed child process exit status on Solaris (bug #1886444).
   * The --defaults-file option caused a mysql error (bug #1886866).

Changelog for mk-show-grants:

2008-02-10: version 1.0.8

   * Added --timestamp option.

Changelog for mk-table-checksum:

2008-02-10: version 1.1.25

   * The --lock option did not work correctly (bug #1884712).

Changelog for mk-table-sync:

2008-02-10: version 1.0.5

   * The Stream algorithm wasn't chosen when a table had no key.
   * Numeric strings beginning with 0 weren't quoted (bug #1883019).
Technorati Tags:, ,

You might also like:

  1. Maatkit version 1709 released
  2. Maatkit version 1674 released
  3. Maatkit version 1877 released
  4. Maatkit version 1314 released
  5. Maatkit version 1579 released

Maatkit version 1709 released

Download Maatkit

This release contains bug fixes and new features. It also contains a new tool: my implementation of Paul Tuckfield’s relay log pipelining idea. I have had quite a few responses to that blog post, and requests for the code. So I’m releasing it as part of Maatkit.

Changelog for mk-archiver:

2008-01-24: version 1.0.7

   * Added --quiet option.
   * Added --plugin option.  The plugin interface is not backwards compatible.
   * Added --bulkins option.
   * Added --bulkdel option.
   * Added --nodelete option.
   * Changed negatable --ascend option to --noascend.

Changelog for mk-parallel-dump:

2008-01-24: version 1.0.5

   * The fix for bug #1863949 added an invalid argument to gzip (bug #1866137)
   * --quiet caused a crash.

Changelog for mk-parallel-restore:

2008-01-24: version 1.0.4

   * The -D option was used as a default DB for the connection (bug #1870415).

Changelog for mk-slave-prefetch:

2008-01-24: version 1.0.0

   * Initial release.

Changelog for mk-table-sync:

2008-01-24: version 1.0.4

   * Made the --algorithm option case-insensitive (bug #1873152).
   * Fixed a quoting bug.
   * Made the UTF-8 options configurable.
Technorati Tags:, , , ,

You might also like:

  1. Maatkit version 1877 released
  2. Maatkit version 1674 released
  3. Maatkit version 1753 released
  4. Maatkit version 1579 released
  5. Maatkit version 1508 released