Xaprb

Stay curious!

Archive for the ‘Scalability’ Category

Fundamental performance and scalability instrumentation

without comments

This post is a followup to some promises I made at Postgres Open.

Instrumentation can be a lot of work to add to a server, and it can add overhead to the server too. The bits of instrumentation I’ll advocate in this post are few and trivial, but disproportionately powerful.

If all server software shipped with these metrics as the basic starting point, it would change the world forever:

  1. Time elapsed, in high resolution (preferably microseconds; milliseconds is okay; one-second is mostly useless). When I ask for this counter, it simply tells me either the time of day, or the server’s uptime, or something like that. It can be used to determine the boundaries of an observation interval, defined by two measurements. It needs to be consistent with the other metrics that I’ll explain next.
  2. The number of queries (statements) that have completed.
  3. The current number of queries being executed.
  4. The total execution time of all queries, including the in-progress time of currently executing queries, in high resolution. That is, if two queries executed with 1 second of response time each, the result is 2 seconds, no matter whether the queries executed concurrently or serially. If one query started executing .5 seconds ago and is still executing, it should contribute .5 second to the counter.
  5. The server’s total busy time, in high resolution. This is different from the previous point in that it only shows the portion of the observation interval during which queries were executing, regardless of whether they were concurrent or not. If two queries with 1-second response time executed serially, the counter is 2. If they executed concurrently, the counter is something less than 2, because the overlapping time isn’t double-counted.

In practice, these can be maintained as follows, in pseudo-code:


global timestamp;
global concurrency;
global busytime;
global totaltime;
global queries;

function run_query() {
  local now = time();
  if ( concurrency ) {
    busytime += now - timestamp;
    totaltime += (now - timestamp) * concurrency;
  }
  concurrency++;
  timestamp = now;

  // Execute the query, and when it completes...

  now = time();
  busytime += now - timestamp;
  totaltime += (now - timestamp) * concurrency;
  concurrency--;
  timestamp = now;
  queries++;
}

I may have missed something there; I’m writing this off the cuff. If I’ve messed up, let me know and I’ll fix it. In any case, these metrics can be used to derive all sorts of powerful things through applications of Little’s Law and queueing theory, as well as providing the inputs to the Universal Scalability Law. They should be reported by simply reading from the variables marked as “global” above, to provide a consistent view of the metrics.

Written by Xaprb

October 6th, 2011 at 5:51 pm

Surge 2011 slides, recap

without comments

This year’s Surge conference was a great sophomore event to follow up last year’s inaugural conference. A lot of very smart people were there, and the hallway track was great.

I presented on three things: a lightning talk about causes of MySQL downtime; I chaired a panel on Big Data and the Cloud; and I showed how to derive scalability and performance metrics from TCP traffic. I’ve sent my slides to the Surge organizers, and I understand that they will be posting them as well as integrating them into the video of my session. In the meanwhile you can download my slides from Percona’s presentations page.

Written by Xaprb

October 6th, 2011 at 4:25 pm

Posted in Conferences,Scalability,SQL

Tagged with ,

I’ll be presenting at Postgres Open 2011

without comments

I’ve been accepted to present at the brand-new and very exciting Postgres Open 2011 about system scaling, TCP traffic, and mathematical modeling. I’m really looking forward to it — it will be my first PostgreSQL conference in a couple of years! See you there.

Written by Xaprb

July 19th, 2011 at 4:04 pm