When can I have a big server in the cloud?

I was at a conference recently talking with a Major Cloud Hosting Provider and mentioned that for database servers, I really want large instances, quite a bit larger than the largest I can get now. The lack of cloud servers with lots of memory, many fast cores, and fast I/O and network performance leads to premature sharding, which is costly. A large number of applications can currently run on a single real server, but would require sharding to run in any of the popular cloud providers’ environments.

» Continue Reading (about 800 words)

What's wrong with MMM?

I am not a fan of the MMM tool for managing MySQL replication. This is a topic of vigorous debate among different people, and even within Percona not everyone feels the same way, which is why I’m posting it here instead of on an official Percona blog. There is room for legitimate differences of opinion, and my opinion is just my opinion. Nonetheless, I think it’s important to share, because a lot of people think of MMM as a high availability tool, and that’s not a decision to take lightly.

» Continue Reading (about 1100 words)

Why high-availability is hard with databases

A lot of systems are relatively easy to make HA (highly available). You just slap them into a well-known HA framework such as Linux-HA and you’re done. But databases are different, especially replicated databases, especially replicated MySQL. The reason has to do with some properties that hold for many systems, but not for most databases. Most systems that you want to make HA are relatively lightweight and interchangeable, with little to zero statefulness, easy to start, easy to stop, don’t care a lot about storage (or at least don’t write a lot of data; that’s usually delegated to the database), and there’s little or no harm done if you ruthlessly behead them.

» Continue Reading (about 500 words)

MySQL disaster recovery by promoting a replica

I was just talking to someone who backs up their MySQL servers once a day with mysqldump, and I said in a catastrophe, you’re going to have to reload from a backup; that’s some amount of downtime, plus up to a day of lost data. And they said “We can just promote a replica, we’ve done it before. It works fine.” Granted, in some/many cases, this is fine. There are all sorts of caveats – for example, you either know that your slave has the same data as the master or you don’t care.

» Continue Reading (about 200 words)