Xaprb » PostgreSQL http://www.xaprb.com/blog Stay curious! Mon, 29 Apr 2013 14:15:07 +0000 en-US hourly 1 http://wordpress.org/?v=3.5.1 New translations of High Performance MySQL http://www.xaprb.com/blog/2013/03/31/new-translations-of-high-performance-mysql/ http://www.xaprb.com/blog/2013/03/31/new-translations-of-high-performance-mysql/#comments Mon, 01 Apr 2013 02:01:27 +0000 Xaprb http://www.xaprb.com/blog/?p=3112 High Performance MySQL, 3rd Edition has been selling very well. It’s translated into many languages. O’Reilly sends me a hard-copy of the translations, and I have a whole section on my bookshelf dedicated to them. It’s really satisfying to look at it.

Today I’m happy to announce that we’re moving forward with a new batch of translations. Demand has been so strong that we want to make the book accessible to as wide an audience as possible. Plus, I get a fat check every time O’Reilly sells the translation rights.

The new languages will include Australian, l337 (“Leet”), Jive, Ebonics, Elmer Fudd, Blissymbols, and Esperanto. Here’s a sample before-and-after paragraph:

Isolating the Column

We commonly see queries that defeat indexes or prevent MySQL from using the available indexes. MySQL generally can’t use indexes on columns unless the columns are isolated in the query. “Isolating” the column means it should not be part of an expression or be inside a function in the query.

Here’s the same passage, translated to Australian:

˙ʎɹǝnb ǝɥʇ uı uoıʇɔunɟ ɐ ǝpısuı ǝq ɹo uoıssǝɹdxǝ uɐ ɟo ʇɹɐd ǝq ʇou plnoɥs ʇı suɐǝɯ uɯnloɔ ǝɥʇ ”ƃuıʇɐlosI“ ˙ʎɹǝnb ǝɥʇ uı pǝʇɐlosı ǝɹɐ suɯnloɔ ǝɥʇ ssǝlun suɯnloɔ uo sǝxǝpuı ǝsn ʇ’uɐɔ ʎllɐɹǝuǝƃ ˥QSʎW ˙sǝxǝpuı ǝlqɐlıɐʌɐ ǝɥʇ ƃuısn ɯoɹɟ ˥QSʎW ʇuǝʌǝɹd ɹo sǝxǝpuı ʇɐǝɟǝp ʇɐɥʇ sǝıɹǝnb ǝǝs ʎluoɯɯoɔ ǝM

uɯnloƆ ǝɥʇ ƃuıʇɐlosI

And here’s the sample in Jive:

Them Columns Cut a Lemon fo Isolatin’

Ain’t nothin but a thang bout them messin’ up my old lady’s indexes cain’t be runnin’ upside down yo’ head. Slap my fro. MySQL can’t dig it with lay no indexes on dem less’n you gets ‘em say I won say I pray I get the same ol’ same ol’. Yo SQL, MySQL, all them SQL. What it is, Mama, what it is. Knock yoself a pro slick, get ‘em spreshuns ain’t be togetha. Use yo’ gray mattah! True dat, git it out wid de functions. Come on got to be! Sheeeeeeeh.

There may be some rough edges, of course. This is only an early draft.

In addition, we are translating the technical examples and code samples into additional computer languages, including popular ones like LOLCATS, ALGOL (sorry, not the latest release — that will come soon), and even obscure languages like Node.JS and Commodore 64. We’re also extending the book with compatibility plugins — sort of “skins” or “personalities” if you will — that will let you apply all the knowledge in the book to irrelevant, obscure database servers like Oracle, PostgreSQL (a.k.a. “Postgre”), Riak, and FAT32.

Your feedback and suggestions are welcome. Let me know if there’s anything I can do to help make your High Performance MySQL experience more enjoyable. Or, if you prefer: Slide your jib, brother sky, don’t be sayin’ no off-time jive, lay it on, you dig? Mash me a fin.

]]>
http://www.xaprb.com/blog/2013/03/31/new-translations-of-high-performance-mysql/feed/ 8
What makes relational databases relational? http://www.xaprb.com/blog/2012/03/13/what-makes-relational-databases-relational/ http://www.xaprb.com/blog/2012/03/13/what-makes-relational-databases-relational/#comments Tue, 13 Mar 2012 21:32:25 +0000 Xaprb http://www.xaprb.com/blog/?p=2666 Do you know why relational databases are called relational? I commonly see explanations such as this:

an RDBMS is called a relational database system because the data is stored in tables.

There, now that’s all cleared up! Or not.

The most common explanation or reason I hear cited for the name is that it’s because of relationships between data. But this isn’t really accurate.

The real reason is because of something called relational algebra, which takes its name from a mathematical construct called a relation. It really doesn’t have any obvious or intuitive association with “relationships.” It’s one of those words that a mathematician redefined for a very specific purpose, and that was the end of it. Just like in computer science, where we use words such as “inheritance,” “class,” and “instantiate” in very specific ways that don’t make sense to non-programmers, “relation” has a meaning that makes most people’s eyes glaze over.

Now, we can get into further arguments about whether relational databases are really relational — and lots of people do that — but I’ll stay away from that for the time being.

And in my best Paul Harvey voice, it’s time to say “and now you know… the rest of the story!”

]]>
http://www.xaprb.com/blog/2012/03/13/what-makes-relational-databases-relational/feed/ 4
Black-Box Performance Analysis with TCP Traffic http://www.xaprb.com/blog/2012/02/23/black-box-performance-analysis-with-tcp-traffic/ http://www.xaprb.com/blog/2012/02/23/black-box-performance-analysis-with-tcp-traffic/#comments Thu, 23 Feb 2012 20:17:54 +0000 Xaprb http://www.xaprb.com/blog/?p=2610 This is a cross-post from the MySQL Performance Blog. I thought it would be interesting to users of PostgreSQL, Redis, Memcached, and $system-of-interest as well.

For about the past year I’ve been formulating a series of tools and practices that can provide deep insight into system performance simply by looking at TCP packet headers, and when they arrive and depart from a system. This works for MySQL as well as a lot of other types of systems, because it doesn’t require any of the contents of the packet. Thus, it works without knowledge of what the server and client are conversing about. Packet headers contain only information that’s usually regarded as non-sensitive (IP address, port, TCP flags, etc), so it’s also very easy to get access to this data even in highly secure environments.

I’ve finally written up a paper that shows some of my techniques for detecting problems in a system, which can be an easy way to answer questions such as “is there something we should look into more deeply?” without launching a full-blown analysis project first. It’s available from the white paper section of our website: MySQL Performance Analysis with Percona Toolkit and TCP/IP Network Traffic

]]>
http://www.xaprb.com/blog/2012/02/23/black-box-performance-analysis-with-tcp-traffic/feed/ 0
When systems scale better than linearly http://www.xaprb.com/blog/2011/10/06/when-systems-scale-better-than-linearly/ http://www.xaprb.com/blog/2011/10/06/when-systems-scale-better-than-linearly/#comments Fri, 07 Oct 2011 02:33:13 +0000 Xaprb http://www.xaprb.com/blog/?p=2493 I’ve been seeing a few occasions where Neil J. Gunther’s Universal Scalability Law doesn’t seem to model all of the important factors in a system as it scales. Models are only models, and they’re not the whole truth, so they never match reality perfectly. But there appear to be a small number of cases where systems can actually scale a bit better than linearly over a portion of the domain, due to what I’ve been calling an “economy of scale.” I believe that the Universal Scalability Law might need a third factor (seriality, coherency, and the new factor, economy of scale). I don’t think that the results I’m seeing can be modeled adequately with only two parameters.

Here are two publicly available cases that appear to demonstrate this phenomenon: Robert Haas’s recent blog post on PostgreSQL, titled Scalability, in Graphical Form, Analyzed and Mikael Ronstrom’s post from May on MySQL (NDB) Cluster, titled Better than Linear Scaling is Possible.

Dr. Ronstrom’s post discusses the mechanics of the phenomenon, and speculates (I’m not sure it’s conclusive) that it is from a combination of partitioning and better use of CPU caches. Now someone needs to do the math to figure out how to include this factor into the equation.

The good thing about the Universal Scalability Law is how simple and applicable it is for many systems. It’s nice that this economy-of-scale factor seems to be unusual and the simpler model remains easy to apply for a large variety of tasks.

]]>
http://www.xaprb.com/blog/2011/10/06/when-systems-scale-better-than-linearly/feed/ 5
Fundamental performance and scalability instrumentation http://www.xaprb.com/blog/2011/10/06/fundamental-performance-and-scalability-instrumentation/ http://www.xaprb.com/blog/2011/10/06/fundamental-performance-and-scalability-instrumentation/#comments Thu, 06 Oct 2011 21:51:28 +0000 Xaprb http://www.xaprb.com/blog/?p=2490 This post is a followup to some promises I made at Postgres Open.

Instrumentation can be a lot of work to add to a server, and it can add overhead to the server too. The bits of instrumentation I’ll advocate in this post are few and trivial, but disproportionately powerful.

If all server software shipped with these metrics as the basic starting point, it would change the world forever:

  1. Time elapsed, in high resolution (preferably microseconds; milliseconds is okay; one-second is mostly useless). When I ask for this counter, it simply tells me either the time of day, or the server’s uptime, or something like that. It can be used to determine the boundaries of an observation interval, defined by two measurements. It needs to be consistent with the other metrics that I’ll explain next.
  2. The number of queries (statements) that have completed.
  3. The current number of queries being executed.
  4. The total execution time of all queries, including the in-progress time of currently executing queries, in high resolution. That is, if two queries executed with 1 second of response time each, the result is 2 seconds, no matter whether the queries executed concurrently or serially. If one query started executing .5 seconds ago and is still executing, it should contribute .5 second to the counter.
  5. The server’s total busy time, in high resolution. This is different from the previous point in that it only shows the portion of the observation interval during which queries were executing, regardless of whether they were concurrent or not. If two queries with 1-second response time executed serially, the counter is 2. If they executed concurrently, the counter is something less than 2, because the overlapping time isn’t double-counted.

In practice, these can be maintained as follows, in pseudo-code:


global timestamp;
global concurrency;
global busytime;
global totaltime;
global queries;

function run_query() {
  local now = time();
  if ( concurrency ) {
    busytime += now - timestamp;
    totaltime += (now - timestamp) * concurrency;
  }
  concurrency++;
  timestamp = now;

  // Execute the query, and when it completes...

  now = time();
  busytime += now - timestamp;
  totaltime += (now - timestamp) * concurrency;
  concurrency--;
  timestamp = now;
  queries++;
}

I may have missed something there; I’m writing this off the cuff. If I’ve messed up, let me know and I’ll fix it. In any case, these metrics can be used to derive all sorts of powerful things through applications of Little’s Law and queueing theory, as well as providing the inputs to the Universal Scalability Law. They should be reported by simply reading from the variables marked as “global” above, to provide a consistent view of the metrics.

]]>
http://www.xaprb.com/blog/2011/10/06/fundamental-performance-and-scalability-instrumentation/feed/ 0