Xaprb

Stay curious!

Archive for the ‘Perl’ Category

How to capture debugging information with mk-loadavg

with 2 comments

Maatkit’s mk-loadavg tool is a helpful way to gather information about infrequent conditions on your database server (or any other server, really). We wrote it at Percona to help with those repeated cases of things like “every two weeks, my database stops processing queries for 30 seconds, but it’s not locked up and during this time there is nothing happening.” That’s pretty much impossible to catch in action, and these conditions can take months to resolve without the aid of good tools.

In this blog post I’ll illustrate a very simple usage of mk-loadavg to help in solving a much smaller problem: find out what is happening on the database server during periods of CPU spikes that happen every so often.

First, set everything up.

  1. Start a screen session: screen -S loadmonitoring. If you don’t have screen, you can run mk-loadavg as a daemon, but it’s much better to use screen in my opinion.
  2. Get mk-loadavg. For purposes of this blog post, I’m going to get the latest trunk code, because I know a bug or two has been fixed since the last release. wget http://www.maatkit.org/trunk/mk-loadavg
  3. Create a directory to hold the collected information in files. mkdir collected

Next let’s set up a script that mk-loadavg can use to gather some information when it detects a high CPU condition. Save the contents of this script as “collect-stats.sh”. The script will gather about 30 seconds worth of statistics. It uses a simple sentinel file /tmp/gatherinfo to prevent multiple occurrences from gathering statistics at the same time. (This is intentionally simple for demo purposes.)

#!/bin/bash

if [ -f /tmp/gatherinfo ]; then exit 0; fi
touch /tmp/gatherinfo
d=`date +%F-%T`
echo "gathering info for $d"
ps -eaf >> collected/$d-ps 2>&1 &
top -bn1 > collected/$d-top 2>&1 &
mysql -e 'show innodb status\G show full processlist\G' >> collected/$d-innodbstatus 2>&1 &
vmstat 1 30 >collected/$d-vmstat 2>&1 &
iostat -dx 1 30 >collected/$d-iostat 2>&1 &
mysqladmin ext -i1 -c30 > collected/$d-mysqladmin 2>&1 &
sleep 30
ps -eaf >> collected/$d-ps 2>&1 &
mysql -e 'show innodb status\G show full processlist\G' >> collected/$d-innodbstatus 2>&1 &
rm /tmp/gatherinfo

Now make the script executable: chmod +x collect-stats.sh. At this point we’re ready to start working. Let’s fire the stats-collection script when the system’s user CPU goes above 40%.

perl mk-loadavg --watch "Server:vmstat:us:>:40" --interval 1 --execute collect-stats.sh

If the CPU goes over 40%, you'll get a bunch of files in the collected directory, with helpful information to diagnose the problem. This example usage is pretty similar to a real-life one I set up recently. It enabled me to take a methodical approach to the problem:

  1. From the top output I was able to identify that MySQL was causing the spike.
  2. I then looked at the SHOW STATUS output to see what the database server was doing, using mext as a helper.
  3. From Select_full_scan and Handler_read_rnd_next I isolated table scans as a problem.
  4. From the saved SHOW PROCESSLIST I found problem queries and optimized them.

You would be right if you said there are much better tools for finding problem queries -- but remember two things: 1) sometimes clients ask for the lightweight, short-term solution that can be set up in about 5 minutes and checked the next day; and 2) when it is unclear that queries are the problem, setting up only a query monitor is stabbing in the dark and will not get results.

In addition to watching vmstat to measure system CPU usage, mk-loadavg can watch many other things, such as the MySQL server's SHOW PROCESSLIST, parsing values from SHOW INNODB STATUS, and so on.

Written by Xaprb

October 21st, 2009 at 8:19 am

Posted in Maatkit, Perl, SQL, Sys-Admin, Tools

Maatkit version 4334 released

without comments

Maatkit version 4334 is ready for download. I see that I missed posting a release announcement about last month’s release of Maatkit. I’ll try to cover the important bits about the last two releases here. Daniel has been posting the release announcements to the mailing list recently, so I’ll do a bit of copy and paste of what he said too.

We’ve released two new tools. These are mk-upgrade and mk-log-player. These are not actually new scripts, but we only just added them to the releases. mk-upgrade is the tool that I’ve been blogging and writing about recently. We got several people to sponsor the development on it, and some of our clients are using it to mitigate the risk of an upgrade or other change to their production environments. mk-log-player is also an old tool that has actually been around for something like a year, and was used by one of our clients who makes a high-performance appliance. The intention of the tool is to apply a realistic production load to a system in a predictable fashion.

As always, mk-query-digest is one of the tools that we apply the most work to. There are several new features in this release, including the ability to parse binary logs, the ability to optimize memcached traffic, and a ton of work on parsing TCP dumps.

We also realized that when we added configuration files to all the tools, we failed to test on Windows. Naturally, any time you don’t test something, that means that you have broken it. And indeed, all the tools immediately failed to run on Windows, but none of us use them on Windows, so we didn’t notice it until much later. We have fixed that.

At the last minute this month we also added a section to the documentation in each tool, which explains the risks of using that tool. These are power tools for power users, but I still felt that it was appropriate to disclose all the risks involved with using the tool.

Now on to last month’s release.

Last month the big news was that we finished all the cleanup of commandline options that we had been doing for several months previously. A lot of the tools changed in ways that were not backwards compatible as a result of this. However, we have a documented command line convention, and going forward all of the tools will be very consistent and easy to understand. Maatkit users voted for this on the mailing list, so we felt pretty good about making this incompatible change.

I’m not going to duplicate the change logs as I usually do in these blog posts. I think I’ll leave that for the mailing list announcements. At some point, we are also going to try to get the change logs online on maatkit.org.

Here are links to the two threads on the mailing list that explain the exact changes:

Written by Xaprb

August 3rd, 2009 at 12:26 pm

Posted in Perl, SQL, Tools

Maatkit version 3519 released

with 7 comments

Maatkit version 3519 is ready for download. There are a lot of changes in this release, many of which are incompatible with previous releases. There are also a lot of important new features. Read on for the details.

First, thanks to everyone who contributed to this month’s release. A lot of people have jumped into Maatkit and started committing code. I attribute this to deliberately forcing a more open policy with decisions being made on the mailing list, rather than the former policy of “Percona pays for development, so they have more say than you do” — a snobby and ill-advised way to treat an open-source project. If you are interested in contributing to Maatkit, please ask. Subversion commit rights are being handed out willy-nilly. It’s great!

Here’s a synopsis of this month’s most important changes.

  • This is a work in progress. We’re making some pretty large changes, and to help us in the process, we’ve changed a lot of code to be more self-checking and help us find errors we introduce during the process. A lot of the other code is kind of involved in this too, so it’s being bundled together with other functionality. As a result, half the tools are done and half the tools aren’t touched yet. The Maatkit wiki has the details. If you’re curious about the reasoning behind the changes, please read the mailing list archives.
  • Command-line options have changed. The mailing list members decided that Maatkit’s command-line options are too confusing and inconsistent, and voted to do something about it. As of this month, we’re about halfway through a process of converting all the tools to use consistent, carefully-thought-out command-line options. Next month I expect there will be even more changes. Check your wrapper scripts to make sure they don’t use old, deprecated options.
  • Configuration files. We’re adding support for configuration files, as specified and approved on the mailing list. The functionality should be simple for MySQL users to understand.
  • Better test coverage. We’re making the tools more testable. Again, a slow process. Very large portions of the code that is bundled together to make each tool is already tested pretty exhaustively, but there are parts of the tools that aren’t, and some of them are very difficult to test; we’re working on that.
  • Fully integrated documentation. We’re working on integrating all the documentation into the tools, so there is no possibility of mismatch between behavior and documentation. We do this in general by making the tool derive its behavior from its documentation! This has proven to be very successful, and we have work underway to push that practice even further until no command-line help output is hard-coded anywhere.

Aside from that, we’ve made minor bug fixes and functionality changes this month, with the exception of mk-query-digest, which has some beta functionality and major bug fixes.

The full change log follows.

Changelog for mk-archiver:

2009-05-03: version 1.0.15

   * Added the --config option for issue 231.
   * Added the --help and --verbose options for issue 318.
   * Updates to shared code.

Changelog for mk-audit:

2009-05-03: version 0.9.7

   * Removed the --askpass long option.  Use --ask-pass instead.
   * Removed the --setvars long option.  Use --set-vars instead.
   * Removed the -t short option.  Use --top instead.
   * Added the following options for issue 248:
   *    --charset (-A)
   *    --defaults-file (-F)
   *    --host (-h) (but not implemented yet)
   * Converted script to runnable module (issue 315).
   * Added the --config option for issue 231.
   * Updates to shared code.

Changelog for mk-duplicate-key-checker:

2009-05-03: version 1.2.3

   * Columns with backticks in comments caused a crash (issue 330)
   * Changed the --allstruct option to --all-structs.
   * Changed the --askpass option to --ask-pass.
   * Changed the --engine option to --engines.
   * Changed the --fuction option to --key-types.
   * Changed the --ignoredb option to --ignore-databases.
   * Changed the --ignoreengine option to --ignore-engines.
   * Changed the --ignoreorder option to --ignore-order.
   * Changed the --ignoretbl option to --ignore-tables.
   * Changed the --setvars option to --set-vars.
   * Removed the -a short option.  Use --all-struct instead.
   * Removed the -c short option.  Use --[no]clustered instead.
   * Removed the -f short option.  Use --key-types instead.
   * Removed the -g short option.  Use --ignore-databases instead.
   * Removed the -E short option.  Use --ignore-engines instead.
   * Removed the -n short option.  Use --ignore-tables instead.
   * Added config file handling and --config (issue 231).
   * Converted script to runnable module (issue 315).

Changelog for mk-parallel-dump:

2009-05-03: version 1.0.15

   * Columns with backticks in comments caused a crash (issue 330)

Changelog for mk-query-digest:

2009-05-03: version 0.9.5

   * The query report printed duplicate table names (issue 337).
   * Print a message and exit early if there's an error (issue 190).
   * Added the --config option for issue 231.
   * Added the --log option for issue 241.
   * Added the --help and --verbose options for issue 318.
   * Fixed another crash when sqrt() of a negative number (issue 332).
   * Fixed a division by zero when a query has zero exec time.
   * Added --print to print query events in slow-log format.
   * Added --type to specify the type of log file (default slowlog).
   * Added --tcpdump to permit parsing output of tcpdump (issue 228).
   * The --shorten option was implemented badly and was slow (issue 336).
   * The report's per-class QPS was calculated incorrectly (issue 326).
   * Updates to shared code.

Changelog for mk-query-profiler:

2009-05-03: version 1.1.15

   * Added the --config option for issue 231.
   * Converted script to runnable module (issue 315). 
   * mk-query-profiler only:
   *    Removed the --allowcache long option.  Use --allow-cache instead.
   *    Removed the --askpass long option.  Use --ask-pass instead.
   *    Removed the --setvars long option.  Use --set-vars instead.
   *    Removed the -a short option.  Use --allow-cache instead.
   *    Removed the -c short option.  Use --calibrate instead.
   *    Removed the -e short option.  Use --external instead.
   *    Removed the -f short option.  Use --flush instead.
   *    Removed the -i short option.  Use --innodb instead.
   *    Removed the -n short option.  Use --only instead.
   *    Removed the -s short option.  Use --separate instead.
   *    Removed the -t short option.  Use --tab instead.
   *    Removed the -r short option.  Use --verify instead.
   * mk-profile-compact only:
   *    Removed the -q short option.  Use --queries instead.
   *    Removed the -m short option.  Use --mode instead.
   *    Removed the -h short option.  Use --headers instead.

Changelog for mk-show-grants:

2009-05-03: version 1.0.15

   * The tool crashed when there were no users (issue 359).

Changelog for mk-slave-delay:

2009-05-03: version 1.0.13

   * Removed the --askpass long option.  Use --ask-pass instead.
   * Removed the --setvars long option.  Use --set-vars instead.
   * Removed the --usemaster long option.  Use --use-master instead.
   * Removed the -d short option.  Use --delay instead.
   * Removed the -c short option.  Use --continue instead.
   * Removed the -q short option.  Use --quiet instead.
   * Removed the -t short option.  Use --run-time instead.
   * Removed the --time long option.  Use --run-time instead.
   * Removed the -u short option.  Use --use-master instead.
   * Removed the -i short option.  Use --interval instead.
   * Added the -q short option for --quiet.
   * Added the --config option for issue 231.
   * Added the --log option for issue 241.
   * Added the following options for issue 248:
   *    --charset (-A)
   *    --defaults-file (-F)
   *    --host (-h)
   *    --password (-p)
   *    --port (-P)
   *    --socket (-S)
   *    --user (-u)
   * Converted script to runnable module (issue 315).

Changelog for mk-slave-find:

2009-05-03: version 1.0.6

   * Removed the --print long option; replication hierarchy tree always printed.
   * Removed the --setvars long option.  Use --set-vars instead.
   * Removed the --askpass long option.  Use --ask-pass instead.
   * Removed the -r short option.  Use --recurse instead.
   * Added the --config option for issue 231.
   * Converted script to runnable module (issue 315).
   * Defaults files were not read properly.
   * Added ability to specify master host with DSN.
   * Updated POD to describe what script actually does.
   * Updates to shared code.

Changelog for mk-slave-move:

2009-05-03: version 0.9.7

   * Removed the --setvars long option.  Use --set-vars instead.
   * Removed the --askpass long option.  Use --ask-pass instead.
   * Removed the -m short option.  Use --timeout instead. 
   * Added the --config option for issue 231.
   * Added the following options for issue 248:
   *    --charset (-A)
   *    --defaults-file (-F)
   *    --host (-h)
   *    --password (-p)
   *    --port (-P)
   *    --socket (-S)
   *    --user (-u)
   * Converted script to runnable module (issue 315).
   * Updates to shared code.

Changelog for mk-slave-prefetch:

2009-05-03: version 1.0.7

   * Removed the --askpass long option.  Use --ask-pass instead.
   * Removed the --checkint long option. Use --check-interval instead.
   * Removed the --iolag long option.  Use --io-lag instead.
   * Removed the --maxquerytime option.  Use --max-query-time instead.
   * Removed the --setvars long option.  Use --set-vars instead.
   * Removed the --numprefix long option.  Use --num-prefix instead.
   * Removed the --permitregexp long option.  Use --permit-regexp instead.
   * Removed the --printnonrewritten long option.  Use --print-nonrewritten
     instead.
   * Removed the --querysampsize long option.  Use --query-sample-size instead.
   * Removed the --rejectregexp long option.  Use --reject-regexp instead.
   * Removed the --setvars long option.  Use --set-vars instead.
   * Removed the -i short option.  Use --check-interval instead.
   * Removed the -x short option.  Use --execute instead.
   * Removed the -l short option.  Use --io-lag instead.
   * Removed the -q short option.  Use --max-query-time instead.
   * Removed the -o short option.  Use --offset instead.
   * Removed the -t short option.  Use --run-time instead.
   * Removed the --time long option.  Use --run-time instead.
   * Removed the -w short option.  Use --window instead.
   * Added the --config option for issue 231.
   * Added the --log option for issue 241.
   * --errors did not work properly.
   * Converted script to runnable module (issue 315).
   * --print and --daemonize are no longer mutually exclusive.

Changelog for mk-slave-restart:

2009-05-03: version 1.0.13

   * Added the --log option for issue 241.
   * Added the --config option for issue 231.
   * Added the --help and --verbose options for issue 318.
   * Removed the --setvars long option.  Use --set-vars instead.
   * Removed the --askpass long option.  Use --ask-pass instead.
   * Removed the -L short option.  Use --error-length instead.
   * Removed the -E short option.  Use --error-text instead.
   * Removed the -M short option.  Use --max-sleep instead.
   * Removed the --maxsleep long option.  Use --max-sleep instead.
   * Removed the -m short option.  Use --min-sleep instead.
   * Removed the --minsleep long option.  Use --min-sleep instead.
   * Removed the -r short option.  Use --recurse instead.
   * Removed the -k short option.  Use --skip-count instead.
   * Removed the --skipcount long option.  Use --skip-count instead.
   * Removed the -s short option.  Use --sleep instead.
   * Removed the -t short option.  Use --run-time instead.
   * Removed the --time long option.  Use --run-time instead.
   * Removed the --untilmaster long option.  Use --until-master instead.
   * Removed the --untilrelay long option.  Use --until-relay instead.
   * Removed the -v short option.  Use --verbose instead.
   * Converted script to runnable module (issue 315).

Changelog for mk-table-checksum:

2009-05-03: version 1.2.5

   * Columns with backticks in comments caused a crash (issue 330)

Changelog for mk-table-sync:

2009-05-03: version 1.0.15

   * Columns with backticks in comments caused a crash (issue 330)
   * Added --lock-and-rename (issue 363).

Changelog for mk-visual-explain:

2009-05-03: version 1.0.14

   * Changed the --askpass option to --ask-pass.
   * Changed the --clusterpk option to --clustered-pk.
   * Changed the --setvars option to --set-vars.
   * Removed the -C short option.  Use --clustered-pk instead.
   * Removed the -c short option.  Use --connect instead.
   * Removed the -f short option.  Use --format instead.
   * Added config file handling and --config (issue 231).
   * Converted script to runnable module (issue 315).

Written by Xaprb

May 4th, 2009 at 12:00 am

Posted in Maatkit, Perl, SQL, Sys-Admin

Tagged with