Archive for the ‘failover’ tag
Failure scenarios and solutions in master-master replication
I’ve been thinking recently about the failure scenarios of MySQL replication clusters, such as master-master pairs or master-master-with-slaves. There are a few tools that are designed to help manage failover and load balancing in such clusters, by moving virtual IP addresses around. The ones I’m familiar with don’t always do the right thing when an irregularity is detected. I’ve been debating what the best way to do replication clustering with automatic failover really is.
I’d like to hear your thoughts on the following question: what types of scenarios require what kind of response from such a tool?
I can think of a number of failures. Let me give just a few simple examples in a master-master pair:
- Problem: Query overload on the writable master makes mysqld unresponsive
- Do nothing. Moving the queries to another server will cause cascading failures.
- Problem: The writable master is completely unreachable
- Fence the writable master and promote the standby master.
- Problem: The writable master is reachable but unresponsive due to overload-induced swapping
- Do nothing. Moving the load to another server will cause cascading failures.
I don’t want to bias the jury, so I’ll stop there and ask you to contribute your failure scenarios and what you think the correct action should be.
High Performance MySQL, Second Edition: Replication, Scaling and High Availability
Continuing in the tradition, which I hope has been as helpful to you as it has been to me, I’m opening the floor for suggestions on chapter 9 of the upcoming High Performance MySQL, Second Edition. Unlike the other chapters for which I’ve listed outlines, this one isn’t substantially written yet. It’s in detailed outline form at this point (a tactic that has worked very well for us so far — I’ll write about that someday).
I’m trying to get feedback much earlier in this chapter’s lifecycle, for several reasons. Two of the most important are that this is one of the first chapters I’ve had a chance to really take from scratch, and the chapters I haven’t written from scratch have been harder to organize, as you’ve probably seen from the last few outlines I posted. There’s a lot of value in working top-down on this deep encyclopedia-style material.
The outline, as it stands now, is basically headings with bulleted lists of important details. Here are the top-level headings:
[Intro] Scaling and High Availability Requirements Replication Overview Configuring Replication Under the Hood of Replication Replication Topologies Replication Administration and Maintenance Replication Problems and Solutions The Future of MySQL Replication Scaling MySQL Horizontally Clustering with MySQL MySQL Cluster Other Clustering Solutions Load Balancing
Just a few notes. These sections are top-level, and will likely be split into many sub-sections like other chapter outlines I’ve posted. A typical section has a couple dozen bullet-points in it, at a high level of granularity, such as “Using DRBD for log replication only.” I think we’ll also add in a separate section on fail-over and fail-back, but that’s not in the outline as of right now (what do you think belongs in it?).
I don’t know what it’s like for you to read outlines and see little bits of the book being assembled, but the process of writing this book is just fascinating to me. It’s endlessly interesting and educational — just the process of writing, let alone the subject matter! This is a really fun project. A heck of a lot of work, but fun nonetheless, and the openness of the project makes it even more fun for me. I’ve learned a lot of surprising and interesting things about writing. I keep wishing I had time to write about this process, but I really need to keep my eye on the deadlines and put that off for later.
Anyway, the usual requests apply: what’s missing, what do you think is cool and should be included, etc etc? Thanks, as usual, for your time and feedback.


