Xaprb

Stay curious!

Archive for the ‘PostgreSQL’ Category

When systems scale better than linearly

with 5 comments

I’ve been seeing a few occasions where Neil J. Gunther’s Universal Scalability Law doesn’t seem to model all of the important factors in a system as it scales. Models are only models, and they’re not the whole truth, so they never match reality perfectly. But there appear to be a small number of cases where systems can actually scale a bit better than linearly over a portion of the domain, due to what I’ve been calling an “economy of scale.” I believe that the Universal Scalability Law might need a third factor (seriality, coherency, and the new factor, economy of scale). I don’t think that the results I’m seeing can be modeled adequately with only two parameters.

Here are two publicly available cases that appear to demonstrate this phenomenon: Robert Haas’s recent blog post on PostgreSQL, titled Scalability, in Graphical Form, Analyzed and Mikael Ronstrom’s post from May on MySQL (NDB) Cluster, titled Better than Linear Scaling is Possible.

Dr. Ronstrom’s post discusses the mechanics of the phenomenon, and speculates (I’m not sure it’s conclusive) that it is from a combination of partitioning and better use of CPU caches. Now someone needs to do the math to figure out how to include this factor into the equation.

The good thing about the Universal Scalability Law is how simple and applicable it is for many systems. It’s nice that this economy-of-scale factor seems to be unusual and the simpler model remains easy to apply for a large variety of tasks.

Written by Baron Schwartz

October 6th, 2011 at 10:33 pm

Fundamental performance and scalability instrumentation

without comments

This post is a followup to some promises I made at Postgres Open.

Instrumentation can be a lot of work to add to a server, and it can add overhead to the server too. The bits of instrumentation I’ll advocate in this post are few and trivial, but disproportionately powerful.

If all server software shipped with these metrics as the basic starting point, it would change the world forever:

  1. Time elapsed, in high resolution (preferably microseconds; milliseconds is okay; one-second is mostly useless). When I ask for this counter, it simply tells me either the time of day, or the server’s uptime, or something like that. It can be used to determine the boundaries of an observation interval, defined by two measurements. It needs to be consistent with the other metrics that I’ll explain next.
  2. The number of queries (statements) that have completed.
  3. The current number of queries being executed.
  4. The total execution time of all queries, including the in-progress time of currently executing queries, in high resolution. That is, if two queries executed with 1 second of response time each, the result is 2 seconds, no matter whether the queries executed concurrently or serially. If one query started executing .5 seconds ago and is still executing, it should contribute .5 second to the counter.
  5. The server’s total busy time, in high resolution. This is different from the previous point in that it only shows the portion of the observation interval during which queries were executing, regardless of whether they were concurrent or not. If two queries with 1-second response time executed serially, the counter is 2. If they executed concurrently, the counter is something less than 2, because the overlapping time isn’t double-counted.

In practice, these can be maintained as follows, in pseudo-code:


global timestamp;
global concurrency;
global busytime;
global totaltime;
global queries;

function run_query() {
  local now = time();
  if ( concurrency ) {
    busytime += now - timestamp;
    totaltime += (now - timestamp) * concurrency;
  }
  concurrency++;
  timestamp = now;

  // Execute the query, and when it completes...

  now = time();
  busytime += now - timestamp;
  totaltime += (now - timestamp) * concurrency;
  concurrency--;
  timestamp = now;
  queries++;
}

I may have missed something there; I’m writing this off the cuff. If I’ve messed up, let me know and I’ll fix it. In any case, these metrics can be used to derive all sorts of powerful things through applications of Little’s Law and queueing theory, as well as providing the inputs to the Universal Scalability Law. They should be reported by simply reading from the variables marked as “global” above, to provide a consistent view of the metrics.

Written by Baron Schwartz

October 6th, 2011 at 5:51 pm

Well done, Postgres Open

with one comment

I thought that Postgres Open 2011 was very well done. I liked the content, the location, and most especially the atmosphere, which felt much more welcoming than some PostgreSQL conferences I’ve attended. This last point bears repeating: I’d exceeded my tolerance for trash talk about MySQL at other conferences, and this event made me feel valued again. I believe that the leaders and organizers set the tone, so I think that Selena and the committee deserve a lot of credit and thanks for the warm atmosphere.

I see that Selena has already announced that there’ll be a 2012 event, which is great. I intend to support it, and I’ve already marked the date on my calendar.

A few people asked me what instrumentation to support scalability and performance analysis would be valuable inside PostgreSQL. The answer I gave in my talk was somewhat sidestepping the question, in hindsight, and I agreed afterwards to follow up with a blog post about it. I’ll write that when I have a chance.

Written by Baron Schwartz

September 19th, 2011 at 5:06 pm