Archive for the ‘MMM’ tag
What kind of High Availability do you need?
Henrik just wrote a good article on different ways of achieving high availability with MySQL. I was going to respond in the comments, but decided it is better not to post such a long comment there.
One of the questions I think is useful to ask is what kind of high availability is desired. It is quite possible for a group of several people to stand in a hallway and talk about high availability, all of them apparently discussing the same thing but really talking about very different things.
Henrik says “At MySQL/Sun we recommended against asynchronous replication as a HA solution so that was the end of it as far as MMM was concerned. Instead we recommended DRBD, shared disk or MySQL Cluster based solutions.” Notice that all of those are synchronous technologies (at least, the way MySQL recommended them to be configured), generally employed to ensure a specific desirable property — no loss of data. But “I must not lose any committed transaction” and “my database must be available” are actually orthogonal requirements. One is about availability, the other is durability.
A lot of people who say they want High Availability actually want High Durability.
There are a great many MySQL users for whom writes are much less valuable than reads. I would point to an advertising-supported website as a canonical example. If the system isn’t available — that is, available to serve read queries — then a lot of money is lost. If someone’s latest comment on a blog post is lost — who cares? Money continues to flow.
This is why a lot of people want a system that keeps the database online, even if some writes are lost. Note that loss of writes is not the same thing as consistency — consistency and durability are also orthogonal for most users’ purposes. So we aren’t talking about eventual consistency or any of the other buzzwords, but simply “the system must respond to read queries.”
Asynchronous replication is well suited to many such users’ availability requirements, as long as replication does not fail (halt) through a write conflict or some other failure mode. (It is often perfectly acceptable for it to fail in other ways, as long as it does not halt.) That’s why a lot of users are interested in the specific type of “high availability” that a system such as MMM is intended to provide (but, as I mentioned, actually doesn’t provide). In other words, MMM would be great for a lot of people, if it worked correctly.
I have also been exposed to applications for which this kind of availability-trumps-durability paradigm is absolutely unacceptable. The advertising system upon which the advertising-supported website relies for its income is a good example. Users know they can build sites that only need to be available for reads, precisely because they are trusting that Google AdSense is highly available for writes! Delegating writes to someone else is the easiest way to build systems.
There is a place for DRBD and MySQL Cluster, and there are also many situations that are served by neither the DRBD nor the MMM type of solution.
Josh Berkus wrote a while back about three types of cluster users, as opposed to three types of clusters. I think it’s helpful to approach the conversation from that angle sometimes too. As a consultant, I almost always do that when I enter a discussion with a customer who wants a “cluster” or “high availability.” Those are basically code phrases that tell me I need to start at the beginning and ensure we are all talking about the same requirements!
I also agree with Henrik about the need to turn off automatic failover. In many, many situations this is by far the best approach. Sometimes people state requirements that, if one steps back and looks at them afresh, quite obviously indicate that an automatic failover is the last thing that’s desirable. For example, if someone tells me that he expects failover to be required less than once a year, this is almost guaranteed not to be a good case for automatic failover. A system that’s tested so infrequently is almost certainly not going to work right when it’s needed. In such cases, it’s far better to leave everything alone until an expert human can resolve the problem, rather than have a stupid machine destroy what would otherwise be a fixable system.
Why high-availability is hard with databases
A lot of systems are relatively easy to make HA (highly available). You just slap them into a well-known HA framework such as Linux-HA and you’re done. But databases are different, especially replicated databases, especially replicated MySQL.
The reason has to do with some properties that hold for many systems, but not for most databases. Most systems that you want to make HA are relatively lightweight and interchangeable, with little to zero statefulness, easy to start, easy to stop, don’t care a lot about storage (or at least don’t write a lot of data; that’s usually delegated to the database), and there’s little or no harm done if you ruthlessly behead them. The classic example is a web server or even most application servers. Most of the time these things are all about CPU power and network bandwidth. If I were to compare them to a car, I’d say they are like matchbox cars: there are many of them, and they are cheap and easy to replace.
Databases are different. With or without replication, you’re looking at a system that is complex, stateful, heavyweight, and cares a lot about storage. It runs on bigger hardware with fast disks and a lot of memory. It’s usually disk-bound, and it does a lot of writes. It’s hard to start — it takes a long time to warm up and really get ready to serve production workloads (many minutes, hours, or even days). It tends to run with a lot of data in memory in a dirty state, so shutdown is slow, because a clean shutdown requires flushing a bunch of data to disk. If you yank its power plug or kill-dash-nine it, it’ll have to perform recovery on startup, which slows the startup process even more. If I were to compare a database server to a car, I wouldn’t even use a car as the analogy: I’d use one of those big-ass mining trucks. If your mining truck breaks down, you don’t just toss it in the trash and pull another off the shelf.
The problem with a lot of HA solutions is that they want to deal with inconsistencies or irregularities by killing the resource and replacing it in another location. This works fine with web servers, but not with database servers. Doing that will cause serious pain and downtime, defeating the point of HA. And when you add replication into the mix, it gets even worse. A system that wants to manage replication needs to deal with very complex conditions. A lot of replication failures are delicate matters that require skilled human intervention to solve. The HA solution must insulate the application from the misbehaving resource, but leave it running so the human can handle things.
This is not the way most applications are made HA. It’s different with databases, and it’s much harder.
Failure scenarios and solutions in master-master replication
I’ve been thinking recently about the failure scenarios of MySQL replication clusters, such as master-master pairs or master-master-with-slaves. There are a few tools that are designed to help manage failover and load balancing in such clusters, by moving virtual IP addresses around. The ones I’m familiar with don’t always do the right thing when an irregularity is detected. I’ve been debating what the best way to do replication clustering with automatic failover really is.
I’d like to hear your thoughts on the following question: what types of scenarios require what kind of response from such a tool?
I can think of a number of failures. Let me give just a few simple examples in a master-master pair:
- Problem: Query overload on the writable master makes mysqld unresponsive
- Do nothing. Moving the queries to another server will cause cascading failures.
- Problem: The writable master is completely unreachable
- Fence the writable master and promote the standby master.
- Problem: The writable master is reachable but unresponsive due to overload-induced swapping
- Do nothing. Moving the load to another server will cause cascading failures.
I don’t want to bias the jury, so I’ll stop there and ask you to contribute your failure scenarios and what you think the correct action should be.


