Xaprb

Stay curious!

Archive for August, 2009

Failure scenarios and solutions in master-master replication

with 27 comments

I’ve been thinking recently about the failure scenarios of MySQL replication clusters, such as master-master pairs or master-master-with-slaves. There are a few tools that are designed to help manage failover and load balancing in such clusters, by moving virtual IP addresses around. The ones I’m familiar with don’t always do the right thing when an irregularity is detected. I’ve been debating what the best way to do replication clustering with automatic failover really is.

I’d like to hear your thoughts on the following question: what types of scenarios require what kind of response from such a tool?

I can think of a number of failures. Let me give just a few simple examples in a master-master pair:

Problem: Query overload on the writable master makes mysqld unresponsive
Do nothing. Moving the queries to another server will cause cascading failures.
Problem: The writable master is completely unreachable
Fence the writable master and promote the standby master.
Problem: The writable master is reachable but unresponsive due to overload-induced swapping
Do nothing. Moving the load to another server will cause cascading failures.

I don’t want to bias the jury, so I’ll stop there and ask you to contribute your failure scenarios and what you think the correct action should be.

Written by Xaprb

August 30th, 2009 at 3:08 pm

Posted in SQL

Tagged with , , , , , ,

A script snippet for aggregating GDB backtraces

with 11 comments

Note: the bt-aggregate tool has been deprecated and replaced by the pmp tool, which can do all that and more.

A short time ago in a galaxy nearby, Domas Mituzas wrote about contention profiling with GDB stack traces. Mark Callaghan found the technique useful, and contributed an awk script (in the comments) to aggregate stack traces and identify which things are blocking most threads. I’ve used it myself a time or five. But I’ve found myself wanting it to be fancier, for various reasons. So I wrote a little utility that can aggregate and pretty-print backtraces. It can handle unresolved symbols, and aggregate by only the first N lines of the stack trace. Here’s an example of a mysqld instance that’s really, really frozen up:

bt-aggregate -4 samples/backtrace.txt | head -n12
2396 threads with the following stack trace:
        #0  0x00000035e7c0a4b6 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
        #1  0x00000000005f2bd8 in open_table ()
        #2  0x00000000005f3fb4 in open_tables ()
        #3  0x00000000005f4247 in open_and_lock_tables_derived ()

4 threads with the following stack trace:
        #0  0x00000035e7c0a4b6 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
        #1  0x0000000000780099 in os_event_wait_low ()
        #2  0x000000000077de42 in os_aio_simulated_handle ()
        #3  0x000000000074a261 in fil_aio_wait ()

Written by Xaprb

August 30th, 2009 at 2:49 pm

Speaking about Maatkit at CPOSC

without comments

I’m going to present on Maatkit at the CPOSC conference in central Pennsylvania on Saturday, October 17th 2009. I’ll give an overview of the toolkit, which is no longer an easy task in a single session. I see a number of other interesting sessions have been accepted. It looks like it’ll be a good conference.

Written by Xaprb

August 29th, 2009 at 9:58 pm

Posted in Conferences,Maatkit,SQL

Tagged with