Archive for the ‘Perl’ Category
Maatkit does more than just MySQL. I’ve just committed a new version of mk-query-digest, a powerful log analysis tool, with support for Posgtres logs, in both syslog and stderr format. I’m hoping that people will give this a spin in the real world. I have lots of test cases, but that’s never enough; I’m looking for people to crunch their logs and let me know if anything breaks.
A brief tutorial:
# Get it $ wget http://www.maatkit.org/trunk/mk-query-digest # Run it $ perl mk-query-digest --type pglog /path/to/log/file # Learn about it (search for the string "pglog") $ perldoc mk-query-digest
I’m going to close comments on this blog post so I don’t get bug reports in the comments. If you have feedback, please post it to the Maatkit mailing list, or the Maatkit issue tracker. Or reply to the thread I just started on the Postgres mailing list.
mk-loadavg tool is a helpful way to gather information about infrequent conditions on your database server (or any other server, really). We wrote it at Percona to help with those repeated cases of things like “every two weeks, my database stops processing queries for 30 seconds, but it’s not locked up and during this time there is nothing happening.” That’s pretty much impossible to catch in action, and these conditions can take months to resolve without the aid of good tools.
In this blog post I’ll illustrate a very simple usage of
mk-loadavg to help in solving a much smaller problem: find out what is happening on the database server during periods of CPU spikes that happen every so often.
First, set everything up.
- Start a screen session:
screen -S loadmonitoring. If you don’t have screen, you can run mk-loadavg as a daemon, but it’s much better to use screen in my opinion.
mk-loadavg. For purposes of this blog post, I’m going to get the latest trunk code, because I know a bug or two has been fixed since the last release.
- Create a directory to hold the collected information in files.
Next let’s set up a script that
mk-loadavg can use to gather some information when it detects a high CPU condition. Save the contents of this script as “collect-stats.sh”. The script will gather about 30 seconds worth of statistics. It uses a simple sentinel file
/tmp/gatherinfo to prevent multiple occurrences from gathering statistics at the same time. (This is intentionally simple for demo purposes.)
#!/bin/bash if [ -f /tmp/gatherinfo ]; then exit 0; fi touch /tmp/gatherinfo d=`date +%F-%T` echo "gathering info for $d" ps -eaf >> collected/$d-ps 2>&1 & top -bn1 > collected/$d-top 2>&1 & mysql -e 'show innodb status\G show full processlist\G' >> collected/$d-innodbstatus 2>&1 & vmstat 1 30 >collected/$d-vmstat 2>&1 & iostat -dx 1 30 >collected/$d-iostat 2>&1 & mysqladmin ext -i1 -c30 > collected/$d-mysqladmin 2>&1 & sleep 30 ps -eaf >> collected/$d-ps 2>&1 & mysql -e 'show innodb status\G show full processlist\G' >> collected/$d-innodbstatus 2>&1 & rm /tmp/gatherinfo
Now make the script executable:
chmod +x collect-stats.sh. At this point we’re ready to start working. Let’s fire the stats-collection script when the system’s user CPU goes above 40%.
perl mk-loadavg --watch "Server:vmstat:us:>:40" --interval 1 --execute collect-stats.sh
If the CPU goes over 40%, you'll get a bunch of files in the
collected directory, with helpful information to diagnose the problem. This example usage is pretty similar to a real-life one I set up recently. It enabled me to take a methodical approach to the problem:
- From the
topoutput I was able to identify that MySQL was causing the spike.
- I then looked at the
SHOW STATUSoutput to see what the database server was doing, using mext as a helper.
Handler_read_rnd_nextI isolated table scans as a problem.
- From the saved
SHOW PROCESSLISTI found problem queries and optimized them.
You would be right if you said there are much better tools for finding problem queries -- but remember two things: 1) sometimes clients ask for the lightweight, short-term solution that can be set up in about 5 minutes and checked the next day; and 2) when it is unclear that queries are the problem, setting up only a query monitor is stabbing in the dark and will not get results.
In addition to watching
vmstat to measure system CPU usage,
mk-loadavg can watch many other things, such as the MySQL server's
SHOW PROCESSLIST, parsing values from
SHOW INNODB STATUS, and so on.
Maatkit version 4334 is ready for download. I see that I missed posting a release announcement about last month’s release of Maatkit. I’ll try to cover the important bits about the last two releases here. Daniel has been posting the release announcements to the mailing list recently, so I’ll do a bit of copy and paste of what he said too.
We’ve released two new tools. These are mk-upgrade and mk-log-player. These are not actually new scripts, but we only just added them to the releases. mk-upgrade is the tool that I’ve been blogging and writing about recently. We got several people to sponsor the development on it, and some of our clients are using it to mitigate the risk of an upgrade or other change to their production environments. mk-log-player is also an old tool that has actually been around for something like a year, and was used by one of our clients who makes a high-performance appliance. The intention of the tool is to apply a realistic production load to a system in a predictable fashion.
As always, mk-query-digest is one of the tools that we apply the most work to. There are several new features in this release, including the ability to parse binary logs, the ability to optimize memcached traffic, and a ton of work on parsing TCP dumps.
We also realized that when we added configuration files to all the tools, we failed to test on Windows. Naturally, any time you don’t test something, that means that you have broken it. And indeed, all the tools immediately failed to run on Windows, but none of us use them on Windows, so we didn’t notice it until much later. We have fixed that.
At the last minute this month we also added a section to the documentation in each tool, which explains the risks of using that tool. These are power tools for power users, but I still felt that it was appropriate to disclose all the risks involved with using the tool.
Now on to last month’s release.
Last month the big news was that we finished all the cleanup of commandline options that we had been doing for several months previously. A lot of the tools changed in ways that were not backwards compatible as a result of this. However, we have a documented command line convention, and going forward all of the tools will be very consistent and easy to understand. Maatkit users voted for this on the mailing list, so we felt pretty good about making this incompatible change.
I’m not going to duplicate the change logs as I usually do in these blog posts. I think I’ll leave that for the mailing list announcements. At some point, we are also going to try to get the change logs online on maatkit.org.
Here are links to the two threads on the mailing list that explain the exact changes: