Archive for the ‘Cloud Computing’ tag
I was at a conference recently talking with a Major Cloud Hosting Provider and mentioned that for database servers, I really want large instances, quite a bit larger than the largest I can get now. The lack of cloud servers with lots of memory, many fast cores, and fast I/O and network performance leads to premature sharding, which is costly. A large number of applications can currently run on a single real server, but would require sharding to run in any of the popular cloud providers’ environments. And many of those applications aren’t growing rapidly, so by the time they outgrow today’s hardware we can pretty much count on simply upgrading and staying on a single machine.
The person I was talking to actually seemed to become angry at me, and basically called me an idiot. This person’s opinion is that no one should be running on anything larger than 4GB of memory, and anyone who doesn’t build their system to be sharded and massively horizontally scaled is clueless.
I’ve received similar push-back from a lot of cloud hosting providers. When I work through the math with clients, a lot of them don’t like the ultimate price/performance ratio offered by cloud hosting. Hype doesn’t drive everyone’s business decisions, so a lot of people are wisely staying far away from cloud hosting for their applications, or even moving whole applications out of cloud hosting into real hardware to consolidate machines and save a lot of money. Some of them are using flash storage devices such as Fusion-io to further lower their TCO (this isn’t the right answer for every app, though).
Why do cloud hosting providers work so hard to make everyone buy lots of anemic machines and shard their applications an order of magnitude more than is required? Why aren’t they jumping to offer really beefy instances? I think there are a couple of simple reasons.
First, they want to colocate virtual machines and over-provision, just as airlines sell more tickets than there are seats in the plane. It’s a numbers game: sell more capacity than you really have, and bet on some of the instances not using all resources allocated to them. Win! Of course, this is only possible with lots of small instances; the law of large numbers doesn’t work without lots of instances, and large instances can’t be colocated. Cloud providers tend to dislike dedicated instances, which leads to the second reason. They don’t want to make strong claims about the availability of any particular machine. This is where the cloud paradigm of “you must build to recover from machines vanishing without warning” comes from. A dedicated beefy instance wouldn’t let the hosting provider push that responsibility onto the application.
There are lots more reasons — all of them combining into one big overall “cloud application architecture best practice” — but I think those are two of the showstoppers.
I really think this is a wrong paradigm. People talk about the cloud being the technology of the future, but in many ways it’s pretty stone-age compared to what smart system architects can achieve with high-quality hardware and networking at a much lower cost, with very strong guarantees of performance, consistency, and availability.
Cloud computing is new enough that we don’t understand, in a collective sense, how to think about it. (I know that lots of individuals do, but as a whole, there isn’t much of a shared understanding.) The real value proposition that I want to see emerge from cloud computing is pretty much orthogonal to what everyone’s raving about these days. I want to see the DevOps engineering discipline build momentum around the idea that systems should be treated as services, with architectural components provisioned and controlled through APIs. That can be done completely independently of many of the characteristics of current cloud computing platforms (virtualization, ephemerality, horizontally scaled architectures…)
And like most people, I’ve got an ego and I don’t appreciate repeatedly being called a moron by cloud computing providers’ sales people, who don’t know anything about running database servers. I can do math and understand price/performance, and I know the cost and difficulty of building a sharded application. I look forward to the day when I don’t have to just bite my tongue and walk on to the next booth. I look forward to cloud hosting providers advancing to the year 2005 or so. I’m sure it will happen as we figure this all out.
Feel free to comment, but don’t expect me to approve your comment if you’re from a cloud provider and you’re plugging your platform :)
I see that a lot of people just don’t get it when they start talking about high availability, redundancy, failover, etc. This is probably not going to change, but maybe I can try anyway.
Let’s think about how you can survive a massive Amazon AWS failure. You build your application to automatically move services to another part of the infrastructure that’s still up. Great! Now assume that everyone else is smart, too. Their applications move, too. What happens next?
The whole AWS cloud melts to the ground. Have you never seen this happen, where one instance of something fails and others pick up the load and fail in turn? I have. OK, so let’s say that you’re really smart, and you also have the ability to move to an entirely different provider. Now suppose that other people are smart too. Next stop — Rackspace Cloud is down, and so is Joyent, and so on.
You can’t just pretend that “the cloud” is infinite. It isn’t. Stop trying! In “the cloud,” you still have to do capacity planning, even though it’s hard or impossible, and you still have to think about the possibility that the resources you assume are there aren’t. Let’s think about cloud computing’s older name — utility computing. Can you think of any utilities that have had capacity shortages, brownouts, or even cascading failures? I worked a bunch of case studies on them in my engineering classes, but I also lived through some of them myself.
This is why some old-fashioned, stupid, clueless people still own their own hardware. Those dumb clod-jumpers aren’t hip enough to move into the cloud where everything is magical. I bet they have kerosene lanterns for when the lights go out, too.
With economies of scale come failures at scale. You can’t have it both ways.
This is a great book on how to build apps in the cloud! I was happy to see how much depth it went into. It’s short — 150 pages plus some appendixes — so I was expecting it to be a superficial overview. But it isn’t. It is thorough. And it is also obviously built on his own experience building very specific applications that he uses to run his business — he isn’t preaching about stuff he doesn’t know first-hand. Finally, George Reese is a good writer! It’s impressive. This is how he covers so much ground with so much depth in so few pages, and it all makes sense. He takes a side trip every now and then, but it’s always in the right place at the right time — how to do a snapshot for backups, for example — and isn’t distracting. For a technical book, it has an amazing narrative flow.
The book begins with an intro to cloud computing in general, with definitions and an explanation of different models, plus cost estimates of traditional IT, managed hosting, and cloud computing for an app. There’s a brief overview of the Amazon platform. This book is mostly about Amazon, and states that up front. There are references and comparisons to other providers throughout, and later there’ll be two appendixes on GoGrid and Rackspace, each written by a representative of that company. I was happy that the author brought in people to write those, instead of doing it himself. They are non-promotional in nature, and quite short. That adds value to the book, which would have been fine without them, honestly.
Back to chapter two now — a deeper introduction to Amazon, moving through all the major components, but especially EC2, S3, and EBS. Here we also start to see a focus on the platform as a whole — availability zones, security, redundancy, reliability. These topics are treated fairly and woven into every chapter. It’s clear that the author doesn’t want to isolate these topics, but rather explain them in context so your mind is always on them as each new topic is introduced. Chapter 3 picks all this up again: considering a move into the cloud? More cost comparisons, more explanations of concepts such as availability and how they translate into the Amazon cloud. Performance, disaster recovery and a few other topics show up here.
Chapter 4 is about how to build an app in the cloud: web app design, making multiple machines work together, handling failure, building AMIs, privacy, and operating databases (especially MySQL) in the cloud. The privacy section is particularly good. I’d recommend this to anyone building an app that might process personally identifiable information or financial information, in or out of the cloud. And as I said already, this is one of the types of things he weaves into the whole book. Chapter 5 picks right up and keeps going: it’s about security. Data security, regulatory compliance, network security, host security, how to respond if there’s a breach. And then Chapter 6 is on disaster recovery: planning, implementing, managing.
Chapter 7 is titled “scaling,” but it’s more than that. It starts with capacity planning. Here’s one of my favorite quotes: “some think they no longer need to engage in capacity planning… [others] think of tens or hundreds of thousands of dollars in consulting fees. Both thoughts are dangerous myths…” There’s a reference to John Allspaw’s excellent book on capacity planning. (I saw that he was a tech reviewer for this book, too.) This chapter covers how you predict and provision for capacity needs in the cloud, including the “automatic scaling” holy grail, how it can bite you, and how to keep that from happening. It also talks about how you scale vertically in the cloud. It doesn’t talk about why it’s hard to really be sure about your capacity needs in the cloud, but that’s okay given the other material covered in the chapter.
And that’s it! After this, it’s 3 appendixes. One is an AWS reference, and then there’s the two on GoGrid and Rackspace.
What’s to criticize? Well, not a lot really. I read every word in this book, I promise. Here’s what I noticed: he talked about database corruption from unexpected shutdowns — he should have said “use InnoDB,” because that’s pretty much a MyISAM problem. He talked about taking backups from replication slaves — he should have said “don’t just trust replication, verify it with mk-table-checksum.” I also think he encourages a little too much trust that cloud providers are always magically going to have the capacity you need; it felt a bit naive, but this is actually a fundamental point in whether you’re going to use the cloud or not. Nobody knows how much excess capacity Amazon has, and as we know, weird things happen. But if you’re going to embrace a cloud platform, you’re going to have to trust to a certain extent.
A couple other things to nitpick: in Chapter 1, when talking about availability, he writes “[if] even 1 minute of downtime in a year is entirely unacceptable, you almost certainly want to opt for a managed services environment… [if] 99.995% is good enough, you can’t beat the cloud.” But these numbers are unrealistic and don’t have enough context to explain what he means. Finally, in a couple of places he talks about algorithms for generating unique identifiers and dealing with concurrent access, but these don’t have a deep enough explanation to prevent novices from shooting themselves in the foot with wrong assumptions such as a timestamp will always increase between each subsequent access. But a savvy developer will recognize those problems and won’t be bitten.
This book is the first one to go onto my list of essential books in a while. I’ll be keeping this one on my own bookshelf.