Comments on: When systems scale better than linearly http://www.xaprb.com/blog/2011/10/06/when-systems-scale-better-than-linearly/ Stay curious! Thu, 02 May 2013 12:36:53 +0000 hourly 1 http://wordpress.org/?v=3.5.1 By: Henrik Ingo http://www.xaprb.com/blog/2011/10/06/when-systems-scale-better-than-linearly/#comment-19680 Henrik Ingo Fri, 07 Oct 2011 12:09:25 +0000 http://www.xaprb.com/blog/?p=2493#comment-19680 Mikael discusses better use of CPU caches, but that’s because he is talking about MySQL Cluster which is already an in memory database. Probably you can see better-than-linear scaling much more easily on a disk based database where dataset is larger than memory on a single node.

We know well (for instance see the Yoshinori tutorial from this year’s MySQL Conference, or coming up in Percona Live UK) that it makes a huge difference if your dataset fits in RAM or not. So like Vadim explains here, once you add more nodes, and assuming the workload can be partitioned so that you don’t just have an identical copy of the InnoDB buffer pool on each node, then by adding nodes you effectively increase the buffer pool. Once it is bigger than the dataset, you get huge increase in performance.

If you want to mode this, the interesting question is what happens as you approach that threshold. You get better performance but I don’t know if there is a simple answer.

]]>
By: Xaprb http://www.xaprb.com/blog/2011/10/06/when-systems-scale-better-than-linearly/#comment-19678 Xaprb Fri, 07 Oct 2011 04:14:07 +0000 http://www.xaprb.com/blog/?p=2493#comment-19678 You can feel free to disagree on the definition of hardware scaling — it’s not my definition, it’s Neil Gunther’s :) I agree with you, we need a way to model the kinds of things you’re talking about.

]]>
By: VadimTk http://www.xaprb.com/blog/2011/10/06/when-systems-scale-better-than-linearly/#comment-19677 VadimTk Fri, 07 Oct 2011 03:55:39 +0000 http://www.xaprb.com/blog/?p=2493#comment-19677 Baron,

I am disagree on your definition of hardware scaling.

Let us have system “NeuvoSQL”, which should be black box,
we do not care what is inside.
Initially it works on 2-nodes, we can scale it to 4-nodes or to 6-nodes, all nodes are equal.
You do not know how it handles data inside.
You only put 512 threads workload on this.
Is this hardware scaling ? I believe it is.

The results ( which are received on real system, available on market) are following:
2-nodes: 5000 tps
4-nodes: 12000 tps
6-nodes: 19000 tps

It is better than linear scalability for me, it was received only by hardware scaling.

If USL model can’t handle it, that’s unfortunate, as I said such systems are real, not theoretical, and they are available on market.

]]>
By: Xaprb http://www.xaprb.com/blog/2011/10/06/when-systems-scale-better-than-linearly/#comment-19676 Xaprb Fri, 07 Oct 2011 03:36:33 +0000 http://www.xaprb.com/blog/?p=2493#comment-19676 The USL model works two ways:

1. Software scaling. You hold everything the same, except you increase concurrency (threads). More workers should get more work done. If the throughput is proportional to the increased concurrency, it’s linear scalability.

2. Hardware scaling. You hold the workload per node as a constant, but you increase the number of nodes doing work. This means if you double the number of servers in the cluster, you should still have the same number of threads per server and the same amount of data per server. If the increase in throughput is proportional to the increase in nodes, it’s linear.

But if we double the number of nodes and keep the dataset the same, that doesn’t fit into the USL model. I think the USL could be extended to model this, but I believe that we’re getting into a lot of complexity involving the working set and the memory boundary. You know this isn’t trivial to understand because it depends on the distribution of requests that cause cache hits versus cache misses.

I think that the super-linear scaling mentioned at the links I showed above are related to this, although I think that Ronstrom’s blog post has too many variables in flux to model simply.

]]>
By: VadimTk http://www.xaprb.com/blog/2011/10/06/when-systems-scale-better-than-linearly/#comment-19675 VadimTk Fri, 07 Oct 2011 02:50:41 +0000 http://www.xaprb.com/blog/?p=2493#comment-19675 Baron,

Better than linear scalability can be achieved relatively easy, especially I expect we can see it with “NewSQL” systems with auto-sharding capabilities.

The math is following:
let’s assume 1 server with X GB of data and Y GB of memory,
now we go to to 2 servers sharding data.
For simplicity, now each server will have X/2 GB of data and Y GB of memory.

Now, if we look on graph
http://www.mysqlperformanceblog.com/2010/04/08/fast-ssd-or-more-memory/
in some points you can see that doubling memory can actually increase
performance 10x.

So moving from 1 server to 2 servers, and re-balancing data, we can get 10x improvement.

I am lazy to figure out math, but it seems there should be coefficient on data decreasing.

]]>