Xaprb

Stay curious!

Products that scale linearly to hundreds of servers

with 16 comments

I see this all the time:

[product] scales linearly to hundreds of servers

I haven’t seen a benchmark yet that’s truly a straight line. I would like to see one.

Written by Xaprb

November 2nd, 2010 at 4:38 pm

Posted in SQL

16 Responses to 'Products that scale linearly to hundreds of servers'

Subscribe to comments with RSS

  1. If the drop off was constant, it’s possible you could have a straight line. It would just not be on a 45% angle as you double nodes, but a lower angle.

    *Or* it could be a straight line if you made it in log scale/added some sleep functions at lower concurrency.

    Morgan Tocker

    2 Nov 10 at 5:03 pm

  2. Right, I’m not looking for a 45-degree angle, just a straight line. It’s a miracle to achieve that.

    Xaprb

    2 Nov 10 at 5:07 pm

  3. Horizontal sharding should scale linearly, right? Also something like memcached or any distributed hash table should be pretty linear.

    Andy

    2 Nov 10 at 5:41 pm

  4. Have you looked at voltdb?

    http://voltdb.com/product

    Their graph looks good ( whose doesn’t), but I haven’t read much about it that wasn’t written by the company.

    William

    2 Nov 10 at 5:45 pm

  5. @William VoltDB is very similar in design to MySQL NDB Cluster which also boasts linear scaling. If you have MySQL already, going to NDB is much less of a jump than going to VoltDB. Especially since NDB does not require you to write all of your transactions as Java stored procedures.

    http://mikaelronstrom.blogspot.com/2008/09/linear-scalability-of-mysql-cluster.html

  6. @Matthew True, MySQL NDB Cluster sounds like it would be an easier migration. Last time I looked into it, more traditional SQL queries with significant joins performed pretty poorly on Cluster.

    Volt seems good, with the exception of the whole Java stored procedures thing.

    William

    2 Nov 10 at 6:33 pm

  7. Sorry to abuse the comments section but:

    @William In sharing many aspects of it’s design with NDB, VoltDB likewise suffers from the poor join performance. However, NDB has recently taken some major steps forward in improving the performance of some JOIN types. The mysql cluster development team is actively looking for people to test it out and let them know where it does or does not yet meet their needs.

    http://johanandersson.blogspot.com/2010/04/mysql-cluster-spj-preview-feedback_27.html

    http://www.clusterdb.com/mysql-cluster/trying-out-mysql-push-down-join-spj-preview/

  8. VoltDB does not scale linearly. I will prove that later. Can anyone point me to benchmarks of MySQL Cluster from 1 to N nodes that show it scaling linearly? Not blog posts with the text “linear scaling” in it, but posts that show N performance at 1 node, N*2 at 2 nodes, …. N*32 at 32 nodes.

    Xaprb

    2 Nov 10 at 8:26 pm

  9. Actually, I’ll prove *now* that VoltDB doesn’t scale linearly. The graph on the page you linked to shows 6 nodes at about 360000 TPS, and 12 nodes at less than 600000 TPS. That’s not linear.

    Xaprb

    2 Nov 10 at 8:30 pm

  10. Matthew, sorry, I missed the link in your comment, my mistake. Mikael Ronstrom’s post says “..roughly 97% improved performance by doubling number of nodes.” Sorry — 97% isn’t enough, it has to be 100% to be linear.

    Xaprb

    2 Nov 10 at 8:32 pm

  11. Sorry, I thought you *weren’t* looking for a 45^ angle on the line which would be slope of 100%. Linear means just that the increase is constant and proportional to number of additional nodes. NDB does provide this.

  12. The 45-degree angle thing is really just a confusion that I shouldn’t have perpetuated. It doesn’t really make sense. (Sorry Morgan…)

    NDB absolutely does not provide constant increase directly proportional to the number of nodes. If it did, then we would have 100% gain per node for 2x the nodes. Period. If 2x the nodes is only 97% gain, then we have a curve, not a line. We have 3% deviation from linearity.

    It is extremely rare and specialized to find systems that even have the possibility of linear scaling. A distributed database that does transactions across a cluster is not one of them.

    Xaprb

    3 Nov 10 at 11:34 am

  13. It occurred to me while I was out doing errands that I should give a more complete example why 100% correspondence between number of nodes and throughput is the only thing that’s actually linear. I think it’s easy to fall into a mental trap of “there is some performance loss, but the amount of loss remains constant per node as you add nodes, thus it’s linear scaling.”

    Suppose 1 node is 100 units of performance; 2 nodes is 180. That’s a 10% deviation from perfect scaling. Suppose we double again, and we “hold the loss constant and add 1.8x capacity for every doubling of nodes.” Then we have 4 nodes, and 180 times 1.8 = 324 units.

    Plot that on a graph. It’s a curve. A “constant amount of loss” is not linear.

    Think about it this way: it’s the same type of thing as compound interest. You know that the principal plus interest grows exponentially as the compounding continues. Your account balance doesn’t grow linearly. Neither does the performance loss from adding nodes.

    Xaprb

    3 Nov 10 at 1:19 pm

  14. Jonas comment:

    DBT2 is a quite complicated benchmark. Reason why MySQL Cluster didn’t scale 100% linearly is that the history table can’t be partitioned so that a transaction only spans 1 node group. (unless altering benchmark of course)

    But given an easier benchmark…that allows transactions to be 100% partitioned, should be linearly (100%) scalable…

    So I *don’t* think it’s impossible…

    Jonas Oreland

    3 Nov 10 at 4:07 pm

  15. VoltDB engineer here.

    On linear scalability:
    Referring to 97% scalability as linear seems to me as a pretty unoffensive exaggeration. As for VoltDB, we’ve tested performance on up to 30 nodes, so I can’t speak to hundreds. For many kinds of workloads in VoltDB, if you double the number of nodes you will get double the performance. This is not true for all workloads and is usually not true for clusters with fewer than three nodes. Yes, there is a graph on our website that is a particular benchmark with some noise to it. This simple graph is a poor substitute for a real POC.

    On joins
    VoltDB can perform many common and useful joins with blazing speed. Joins that would require large data moves between nodes are out of scope for our project. I’m not aware of anyone in the transaction processing space who does these well. The analytics space is a different story.

    On MySQL NDB Cluster
    NDB and VoltDB have some key similarities and some key differences. Both require some effort on the part of the app developer to get the performance they promise. While I’m totally biased, I think VoltDB strongest win over NDB is simplicity. The time it takes to go from beginner to expert seems dramatically different between the two systems.

    John Hugg

    5 Nov 10 at 2:45 pm

  16. Hi, stumbled onto this blog, saw this question, and can’t resist pointing you to http://www.xlmpp.com
    Enjoy,
    abroad

    abroad

    15 Nov 10 at 12:08 pm

Leave a Reply