When can I have a big server in the cloud?
I was at a conference recently talking with a Major Cloud Hosting Provider and mentioned that for database servers, I really want large instances, quite a bit larger than the largest I can get now. The lack of cloud servers with lots of memory, many fast cores, and fast I/O and network performance leads to premature sharding, which is costly. A large number of applications can currently run on a single real server, but would require sharding to run in any of the popular cloud providers’ environments. And many of those applications aren’t growing rapidly, so by the time they outgrow today’s hardware we can pretty much count on simply upgrading and staying on a single machine.
The person I was talking to actually seemed to become angry at me, and basically called me an idiot. This person’s opinion is that no one should be running on anything larger than 4GB of memory, and anyone who doesn’t build their system to be sharded and massively horizontally scaled is clueless.
I’ve received similar push-back from a lot of cloud hosting providers. When I work through the math with clients, a lot of them don’t like the ultimate price/performance ratio offered by cloud hosting. Hype doesn’t drive everyone’s business decisions, so a lot of people are wisely staying far away from cloud hosting for their applications, or even moving whole applications out of cloud hosting into real hardware to consolidate machines and save a lot of money. Some of them are using flash storage devices such as Fusion-io to further lower their TCO (this isn’t the right answer for every app, though).
Why do cloud hosting providers work so hard to make everyone buy lots of anemic machines and shard their applications an order of magnitude more than is required? Why aren’t they jumping to offer really beefy instances? I think there are a couple of simple reasons.
First, they want to colocate virtual machines and over-provision, just as airlines sell more tickets than there are seats in the plane. It’s a numbers game: sell more capacity than you really have, and bet on some of the instances not using all resources allocated to them. Win! Of course, this is only possible with lots of small instances; the law of large numbers doesn’t work without lots of instances, and large instances can’t be colocated. Cloud providers tend to dislike dedicated instances, which leads to the second reason. They don’t want to make strong claims about the availability of any particular machine. This is where the cloud paradigm of “you must build to recover from machines vanishing without warning” comes from. A dedicated beefy instance wouldn’t let the hosting provider push that responsibility onto the application.
There are lots more reasons — all of them combining into one big overall “cloud application architecture best practice” — but I think those are two of the showstoppers.
I really think this is a wrong paradigm. People talk about the cloud being the technology of the future, but in many ways it’s pretty stone-age compared to what smart system architects can achieve with high-quality hardware and networking at a much lower cost, with very strong guarantees of performance, consistency, and availability.
Cloud computing is new enough that we don’t understand, in a collective sense, how to think about it. (I know that lots of individuals do, but as a whole, there isn’t much of a shared understanding.) The real value proposition that I want to see emerge from cloud computing is pretty much orthogonal to what everyone’s raving about these days. I want to see the DevOps engineering discipline build momentum around the idea that systems should be treated as services, with architectural components provisioned and controlled through APIs. That can be done completely independently of many of the characteristics of current cloud computing platforms (virtualization, ephemerality, horizontally scaled architectures…)
And like most people, I’ve got an ego and I don’t appreciate repeatedly being called a moron by cloud computing providers’ sales people, who don’t know anything about running database servers. I can do math and understand price/performance, and I know the cost and difficulty of building a sharded application. I look forward to the day when I don’t have to just bite my tongue and walk on to the next booth. I look forward to cloud hosting providers advancing to the year 2005 or so. I’m sure it will happen as we figure this all out.
Feel free to comment, but don’t expect me to approve your comment if you’re from a cloud provider and you’re plugging your platform :)



Neither IaaS nor PaaS will ever be a suitable platform for an application whose architecture is a large single instance, where latency between bits of the same application is important. Something like, for example, a financial institutions’s main ledger.
So when architecting an environment for such an application, a lot of effort goes into assessing and PoC-ing, and even then the sizing can be knocked sideways by an application code drop.
However, there are plenty of cases where the workload is parallel and fungible, and *can* be deployed to a virtualised environment. And in many organisations an awful lot of their workload satisfies this (while significant and important parts don’t). And at the same time, design choices made early in the application development cycle can radically change the eventual target environment.
A good architect will devise an appropriate environment for an application to run in, with the correct attributes.
Dunstan
10 Jun 11 at 11:50 am
It might help if you put some real numbers out there to let us know what you mean by “big”. I’ve seen vendors with servers with 68.4 GB of memory. I think that would qualify as “big” for a lot of people. But then again, for other people that’s definitely on the smaller end.
Michael Peters
10 Jun 11 at 1:51 pm
Michael, yes, today I can easily get a server with 24 fast cores, 256GB to 1TB of memory, and tens to hundreds of thousands of I/O operations per second. Even with a much less powerful server than that, I can run rings around anything I’ve ever benchmarked in any cloud platform. In particular, it would help a lot if I could get lots more RAM in cloud servers, because that would help make up for anemic I/O performance in a lot of workloads.
Xaprb
10 Jun 11 at 3:41 pm
I think as another commenter, Dunstan, alluded to, there just aren’t many requirements for large, collated systems. And when you’re doing things that large, it is going to be cheaper to do it in-house (economies of scale and such).
You’ve taking quite a hostile stance towards cloud computing lately. Obviously, it’s not a one-size fits all solution. However, the big appeal for cloud computing is two-fold: It’s future-scalable and I Don’t Have To Think About It. Both of these are great for new startups who have a basic idea but little capital and can’t afford to roll out huge servers, and want to spend time and money on new development rather than administration. (As these startups mature their needs may change, and lock-in might be an issue but that’s a separate discussion.)
The other benefit of cloud computing is that it supports the big-picture view of the internet as an distributed, interconnected service API for various data. When it comes down to it, we’re really concerned about data, not software, and the prevalence of small sites, services, projects (often hosted on cloud or small dedicated servers) sharing specialized information with each other is what the web is all about.
So no, cloud hosting is not appropriate for things like financial records, medical records, data carriers, search engines, or the NSA. But most people aren’t doing that stuff. Most people are making small, focused apps that someone will use as a service or integrate with some other, tangentially-related product. For example, my SVN hosting service automatically posts updates to my bug tracking service. I have another service that automatically syncs files between my computers–that’s all it does.
A lot of these services have APIs and integrate with one another. The point of these apps isn’t to create a huge product that does everything, nor to maintain complicated relationships and business logic that you’d find in a large organization like IBM or Google. The point is to create a small, focused app that does one thing really well. The APIs expose the data and allows users to create their own relations.
I think this notion of small, focused service providers is more organic, democratic, and entrepreneurial, and it’s certainly very popular. With a few smart people you can make a great product, and you don’t need the backing of an enterprise to do it–well, except for the one hosting your cloud services.
In the big-picture sense, I think this paradigm works well for startups and service providers. Obviously, it’s also important to use the right tool for the job, and sometimes (perhaps most times) the cloud is not it. But as you’ve mentioned, it’s novel technology, and we’re still learning when and where the cloud is the right tool.
Chris Bednarski
10 Jun 11 at 4:44 pm
You don’t even need to look at price/performance to see that Baron is right and the cloud is not for everyone. Even when you have a scalable system, single node performance matters. Perfectly linear scalability is exactly like 100% availability.
Peter Boros
10 Jun 11 at 4:52 pm
I am a a conference right now that had an entire day focused on clouds
Baron is right , I would not build out a db in a cloud at all with the current costs and performances available
Keith
10 Jun 11 at 5:28 pm
We at Recorded Future are running on Amazon and we have very much this requirement. Assuming we would scale horizontally more than we do, and shard on, say, 4Gb servers, that would be close to impossible, it would just require too many servers. Even horizontal scaling does provide 100% scalability, which at some point you will not be able to add power just by running more instances? Anyone thinking that many 4G instances are what you need were probably involved when the 640K PC memory limit was introduced. How small can your servers be? Shoudl system be able to handle massive abount of Android phones? Or?
But in my mind, the #1 issue isn’t CPU or memory limitations, which are there and are real, but I/O and network bandwith and instability.
Another issue is that so few database systems really are cloud enabled. Truly cloud enabled in the sense that servers can be added or removed at will, without downtime or application changes, and your needs change.
If I can dynamically add disks and expand my filesystems, why is it so difficult to seemlessly grow and shring database performance?
But it is.
/Karlsson
Anders Karlsson
10 Jun 11 at 6:44 pm
What is the max IOPs you have seen MySQL do for InnoDB or MyISAM on real or bogus workloads? For my tests that might be CPU limited and were not run on new versions of InnoDB that support multiple buffer pool instances, I have not exceeded 40,000 IOPs for InnoDB and 80,000 for MyISAM, so I wonder how you can use hundreds of thousands of IOPs that some big servers provide via PCI-based flash.
Mark Callaghan
10 Jun 11 at 7:43 pm
Mark,
I don’t remember precisely, but I haven’t seen anything close to hundreds of thousands of IOPS, for sure. It was careless to mention that I can get 100k+ IOPS.
Xaprb
10 Jun 11 at 9:40 pm
Chris, I’m not taking a hostile stance towards the cloud, although someone says that pretty much every time I blog about cloud computing. I’m just not jumping on the hype band-wagon, and I’m not repeating generalities such as “you should use the right tool for the job” because it doesn’t add anything significantly valuable to readers. I am generally not saying much about cloud computing unless I think I’ve got something to add that others haven’t said (much) before.
I would contest the notions that the benefits are future-scalability and don’t-have-to-think-about-it. I think that’s the current hype, arguably unrealistic, and I think it’s quite different from what the value proposition will mature into. I hope so, anyway.
Xaprb
11 Jun 11 at 8:52 am
Nice post, validates what I’m seeing too. It’s mostly marketing smoke and mirrors.
The real trend here is consumers are moving from desktop applications to web applications thanks to the evolution of Javascript. Yes, “cloud computing” is a big trend.
Sure, Windows XP will go away eventually, no big revelation there. So that growth curve looks great, but now you have all this other junk (same old web hosting) clinging to the “cloud computing” magic carpet till the next buzzword comes along.
PJ Brunet
12 Jun 11 at 8:34 pm
The cloud vs. hardware debate at one level sounds a bit like the religious wars between DBMS types (centralized, vertically scaled) and CORBA types (distributed, horizontally scaled) in the 1990s. My own experience has been that distributed systems that involve shared state are extremely hard to build, in part because you quickly run into some very difficult problems involving consensus. I doubt this will change anytime soon because it’s based on fundamental math that is not going to change. A lot of the current NoSQL efforts will eventually end in tears or at least be boxed into limited corners for this reason.
Bandwidth problems like what we see with EC2 look like ephemeral issues. NFS didn’t work so great for many years but is now robust and fast. I would expect cloud storage to follow the same trajectory. Some of the current performance problems just sound like a challenge to do better.
Robert Hodges
13 Jun 11 at 1:05 am
The cloud is an Infrastructure as a Service (IASS). If you plan on using the cloud as a means of instantly getting access to more H/W as your site grows you are severely mistaken.
While the cloud is ideal for certain workloads (I know of clients with 300+ always running instances) for the given model it is ideal to support spikes. However if you do not architect your application to support the strengths of cloud computing (aka. async processing, queuing, many smaller servers, unavailability etc) you are just going to observe pain as your application grows and scales.
There are many downsides including what you describe. The most critical is “virtualization” which creates unpredictable impacts especially around Disk I/O and network latency which is one of the most important considerations for a database.
As cloud providers improve you will find that virtual cloud options like Eucalyptus and Open Stack (disclaimer: I have used neither of these yet) that run on the larger dedicated H/W you seek will become more applicable for large databases on the cloud.
Ronald Bradford
13 Jun 11 at 12:25 pm
Dealing with spikes of load is one of the things that people talk about a lot but I don’t think they understand the realities until they’ve done it for a while. For web servers and other things that have no persistence, it’s great. Most database servers, however, can’t just be spun up and down as needed. It’s not a matter of the DB server being hard to manage, it’s because you still need a whole database server even if there’s only a small fraction of the load. Databases like Xeround that grow and shrink do so by repartitioning the data. Most people find that they end up with a lot of always-on database instances even when load is light. The “cloud database of the future” needs dynamic scale-up and scale-down, but it’s not yet clear that any of today’s technologies is really there. Certainly most of them aren’t. MySQL definitely isn’t.
This is all in addition to your other points; MySQL is not tolerant of variable I/O resources either, as you point out.
Xaprb
13 Jun 11 at 5:01 pm
Next time – don’t bite your tongue. Channel your inner Larry:
http://www.youtube.com/watch?v=KmXJSeMaoTY
:)
Mark Leith
14 Jun 11 at 5:10 am
http://xkcd.com/908/
pcrews
14 Jun 11 at 4:30 pm
I think you just were a bit too harsh (in your wording).
Anyway:
The quick scaling is definitely something that the cloud delivers.
IMHO what “everyone is wrong about”(TM) is that cloud doesn’t have to be EC2 like (virtualized).
I don’t see a reason that (given the budget is available) I couldn’t have a pool of hardware servers that will be dynamically reassigned to certain application depending on current needs. And that is where you’ll win.
If you have 100 machines that always have a certain load level and 20 machines in the pool (yes I’m keeping the numbers on the lower level) then there’s nothing to keep you from assigning the pool machines to either frontend servers or “magically” fire up database instances that will replace currently broken parts of a cluster or some other usage that is need “right now”.
That won’t quite apply to the classic RDBMS model. This is where just a few well sized machines are perfect. But the “kool new aid” (a.k.a. NoSQL – there I said it!) like Cassandra, scalaris and such will definitely benefit from this model.
Now multiply those numbers by 100 and find:
* time of usage spike (per application)
* level of max usage that is acceptable to you
That is the win of the what is marketed of the cloud. Of course one could also just say: “Yes we can (re-)deploy our server really fast and we have lots of spare hardware that fits our environment” instead of “cloud”
.oO(But yes I agree cloud providers want you to use a lot of numbers of small instances and that isn’t always what’s needed)
serverhorror
17 Jun 11 at 3:41 pm
Please provide examples of anyone going from 20 to many more servers for Cassandra or HBase while using EC2. I doubt it is either quick or easy if they have a reasonable amount of data. It is even harder when they use local attached storage rather than EBS and I am skeptical about using EBS for either HBase or Cassandra. When local storage is used it can take some time to transfer data to new servers.
Mark Callaghan
20 Jun 11 at 10:52 am
I think Teo Schlossnagle wrote an good article on this topic back in March 2010: http://omniti.com/seeds/2010/spring “The cloud is great. Stop the hype”
Ted Wennamrk
23 Jun 11 at 3:15 am