Tag Archive for 'postgresql'

PostgreSQL Conference East 2008

PostgreSQLAs a I wrote a couple of days ago, I went to the second day of PostgreSQL Conference East 2008 last Sunday. I had a good time and really enjoyed meeting everyone, listening, learning, and occasionally talking. I asked a number of fearless-newbie questions that paid off handsomely: people were very willing to humor me. I also left with a beautiful t-shirt, mug, and bag combo thanks to EnterpriseDB. The bag has already been put to use for a grocery shopping trip.

Note to conference/website organizers: I can’t link to anything but the front page, so I assume my link above will someday point to the 2009 conference, or the 2008 West conference. It would be good to give each event a permalink right from the start…

One thing that surprised me was the distance people traveled to attend. I thought this would be an east-coast USA thing, but people came from Portland, Russia, and beyond.

Roundtable

The first event was an open discussion. At the front of the room were Bruce Momjian, Joshua Drake, Magnus Hagander, and Selena Deckelmann. The first question was about the future of Postgres: what are the goals for the 9.0 release? The answers varied, but generally the sense was that in the future Postgres should continue to add more features and not only catch up to, but surpass the “big boys.” Special mention went to recursive queries, windowing functions, point-in-time recovery, and more standards compatibility.

This was followed by a lengthy discussion on user groups, global vs. local, and so on. One interesting quote here is that no one can buy Postgres because there’s literally no one to buy it from.

After that I poked my hand up and asked what you say to people migrating from other RDBMSs, such as MySQL. I received a warm welcome, a statement that Postgres is hands-down superior to MySQL period, and a lot of interesting commentary on the differences in the communities between the two. I have been thinking a lot on the MySQL community and am not yet ready to put my thoughts into words, so I’ll just give an overview of what the panelists said: the communities are quite opposite in many respects, both organizationally and psychologically.

This was followed by a question about how to encourage development of a feature that “people need.” This also went quite deep into the open-source mindset and development methodology, with people pointing out that the Postgres community is a meritocracy and you cannot co-opt it with money. At the same time, what “the community” wants isn’t what goes into the codebase: the itches that get scratched are the hacker itches, not the community itches. Sometimes these are one and the same.

Apparently one of the community itches is in-place upgrades. I gather that an upgrade requires a dump and reload because releases are not capable of reading files written by previous releases. This sounded like a pretty severe problem, yet the “hacker itch” wasn’t there. People said that they frequently get told “that’s already solved: dump and reload.” Not a solution with large data volumes.

The discussion then turned to why more people aren’t capable of meeting their own needs. My personal belief here is that the big corporations are buying the minds of the smart people by infiltrating universities and schools, and we (we the citizens of the USA, not we the hackers) are just standing by and letting it happen as though it’s a good thing for powerful vested interests to be “giving” our schools “free” software and other things that they cannot inspect, hack, and change. The other problem is that universities aren’t teaching data. They’re teaching everything but data, yet that’s the most important part of the technology economy today. Tools are not as important: they exist only to work with data. You’re lucky to find someone who’s been university-educated in any database, much less an open-source or Free Software one.

Most of what I heard from the panel agreed with my personal views, but they didn’t focus on the problem in the university as much as I feel is important. And just as importantly, perhaps I didn’t hear enough recognition as I wished that there’s a real chance to change this: commercial/opensource companies like EnterpriseDB can really pull a long lever here by counter-infiltrating the classroom. Aside from just legislating the proprietary software right out of the classroom — which I think would be a good start — we can subvert them also.

Around this point someone in the room opined that one of the things that’s unique about Postgres is the difficulty of finding a competent DBA, and the expense of hiring them. This person said that it’s easy to find Oracle DBAs, and you can hire a good MySQL DBA a dime-a-dozen for $35,000 USD per year. I kept my mouth shut, but suffice to say this is not my experience at all. I think we’re all in the same boat here, and this is a case of the grass looking greener on the other side.

The great quote I heard in this session was “We take Oracle DBAs and try to break them.” Someone please step up and take credit for that one :-)

SQL/XML for Developers

This talk was by Lewis Cunningham of EnterpriseDB. He introduced people to XML and then showed the functions that have recently (which release?) been added to Postgres for manipulating XML documents and document fragments. There’s also a native XML datatype, which I asked a few questions about. Apparently it is TEXT under the hood, with a well-formedness check in front of it. I asked a little about the storage format, and was told TEXT is stored out-of-line for large values, lz-compressed, and not allocated a page-at-a-time as with MySQL’s InnoDB engine (so it’s not as wasteful — I wanted to get a sense of whether it would be very inefficient to store XML in Postgres from the memory/disk point of view).

The XML functionality in Postgres is built upon the excellent and ubiquitous libxml2 library. The developer who coded the XML functionality was also in the room. His name is Nikolay Samokhvalov.

I asked about indexing. Since Postgres offers functional indexing (that is, you can index the result of a function — not “its indexing works”), in theory you could index XML documents by indexing the result of an XPath expression, for example. I was looking for the “yes, but” and I got it: there are some planner (query optimizer, for MySQL folks) limitations to this approach.

The great quote from this session was the response to “what would you use instead of Hibernate?” (Hibernate is a Java ORM system). The response was “hand-code it in assembly.” Beautiful.

Big, Bad, Broken, PostgreSQL

This talk was by Robert Treat of OmniTI. He described how a data warehouse turned into a train wreck and how they recovered it. The exact cause of failure is apparently still not known. But it sounded like an interesting, sleepless time. This was a pretty technical discussion. One thing I found interesting was the definition of “large” data warehouse. To my mind, a terabyte or two isn’t exceptionally large. Is that very large in the Postgres world? I’m not trying to be a jerk… just trying to understand. I think one of the reasons it might be large goes back to what people were saying about the need to dump and reload for every upgrade: doing that for a TB of data sounds like a significant barrier to building really large systems.

Monitoring PostgreSQL with ptop

This session was given by Selena Deckelmann. ptop is a top clone that is literally derived from the Unix top utility. It has the ability to monitor current queries as well as looking at the statistics from the operating system itself.

(Tangent: This is an interesting approach, and one which an innotop user has said he’s working on adding to innotop. innotop can monitor many systems at once, but it doesn’t monitor the operating system — it talks only to the MySQL server. This user was talking about opening an SSH connection to each server and looking at /proc/vmstat and /proc/diskstats as well).

Sorry for going off on a tangent. Anyway, ptop is a C app that Selena and one other person maintain. It can show the current processes, list of locks, explain queries, and so on. One interesting limitation is that it can’t monitor a whole server: it’s constrained to a single database. I gather this is because PostgreSQL’s statistics views, which it queries, are per-database.

Afterward

After the conference ended, a few of us piled into cars and followed Bruce into DC for a tour. We visited the Lincoln Memorial, the Viet Nam Memorial, went through the World War II memorial, and up to the Washington Monument. At this point I split and went back home.

All in all a great time and great people, and I’m sorry I missed the first day. This event is so close to me (3 hours drive) that I will really try to make the entire weekend next time, unless it again conflicts with my wife’s 10-mile race schedule.

Technorati Tags:, , , , , , , , , , , , ,

You might also like:

  1. Going to PostgreSQL Conference East
  2. Like it or not, it is the MySQL Conference and Expo
  3. MySQL Conference and Expo 2008, Day Two
  4. MySQL Heartbeat supports PostgreSQL

Going to PostgreSQL Conference East

I’m heading out to PostgreSQL Conference East in a few hours. Alas, I missed the first day but I’ll be there all day tomorrow. I hope to learn, meet people, and generally participate in goodness.

And no, I’m not an expert in PostgreSQL as I am in MySQL, but I have always held it in very high esteem. I am not going to try to spread the MySQL-ness into the PostgreSQL camp :-)

See you there.

Technorati Tags:,

You might also like:

  1. Like it or not, it is the MySQL Conference and Expo

What is new in Maatkit

My posts lately have been mostly progress reports and release notices. That’s because we’re in the home stretch on the book, and I don’t have much spare time. However, a lot has also been changing with Maatkit, and I wanted to take some time to write about it properly. I’ll just write about each tool in no particular order.

Overall

I’ve been fixing a fair number of bugs, most of which have been in the code for a while. Every bug I fix these days gets a test case to guard against regressions. I’ve integrated the tests into the Makefile, so there’s no way for me to forget to run them.

The test suite has hundreds of tests, which is probably pretty good in comparison to many projects of this type. However, there will probably never be enough tests. I’ve moved much (in some cases, almost all) of the code into modules, which are easy to test, but it’s always a little harder to test programs themselves, so some things aren’t tested. (For example, it’s tedious to set up a test case that requires many MySQL instances to be running in a multi-tier replication setup).

Still, I think the quality has increased a lot in the last 6 months or so, since I’ve been more disciplined about tests. That discipline, by the way, was forced on me. The mk-table-sync tool was completely unmanageable. I was able to rewrite that tool in December, almost entirely using modularized, tested code.

mk-heartbeat

Jeremy Cole and Six Apart originally contributed this tool. Since then I’ve added a lot more features, allowed a lot more control over how it works, and it even works on PostgreSQL now. As an example, I added features that make it easy to run every hour from a crontab. It daemonizes, runs in the background, and then quits automatically when the new instance starts. I use it in production to give me a reliable metric for how up-to-date a slave is. When I need to know absolutely “has this slave received this update,” Seconds_behind_master won’t do, for many reasons. Load balancing and lots of other things hinge on up-to-date slaves.

mk-parallel-dump

I think this tool is probably the fastest, smartest way to do backups in tab-delimited format. I’ve been fixing a lot of bugs in this one, mostly for non-tab-delimited dumps. It has turned out to be harder to write this code because it uses shell commands to call mysqldump. (The tab-delimited dumps are done entirely via SQL, which is why it’s so good at what it does).

mk-slave-restart

I’ve been having a lot of trouble with relay log corruption, so unfortunately this tool has become necessary to use regularly in production. As a result I made it quite a bit smarter. It can detect relay log corruption, and instead of the usual skip-one-and-continue, it issues a CHANGE MASTER TO, so the slave will discard and re-fetch its relay logs. I’ve also made it capable of monitoring many slaves at once. (It discovers slaves via either SHOW SLAVE HOSTS or SHOW PROCESSLIST, so if you point it at a master, it can watch all the master’s slaves with a single command).

mk-table-checksum

I’ve made a lot of changes to this tool recently. Smarter chunking code to divide your tables into bits that are easier for the server to work with, TONS of small improvements and fixes, and much friendlier behavior.

The most recent release also includes a big speed improvement. Most of the time this tool spends is waiting for MySQL to run checksum queries. While my pure-SQL checksum queries are faster than most (all?) other ways to compare data in different servers, I’ve recently been trying to reduce the amount of work they cause.

As a result, I investigated Google’s MySQL patches. Mark Callaghan mentioned to me that he’d added a checksum function into their version of the server, and I wanted to look at that. They’re using the FNV hash function to checksum data. I decided that a UDF would be a fine way to write a faster row-checksum function, so I wrote a 64-bit FNV hash UDF. While I’m not the first person to do that, my version accepts any number of arguments, not just one. This makes it a lot more efficient to checksum every column in a row, because you don’t have to a) make multiple calls to the hash function or b) concatenate the arguments so you can make a single call. I also copied Google’s logic to make it simpler and more efficient to checksum NULLs, which avoids still more function calls. The UDF returns a 64-bit number, which can be fed directly to BIT_XOR to crush an entire table (or group of rows) into a single order-independent checksum. And finally, FNV is also a lot faster than, say, MD5 or SHA1.

The results are quite a bit faster for my hardware: 12.7 seconds instead of 80 seconds on a CPU-bound workload. So that’s at least a 6.2x speedup. (80 seconds was the best I was able to achieve before. Some of the checksum techniques used up to 197 seconds on the same data).

The UDF is really simple to compile and install, does no memory allocations or other nasty things, and should be safe for you to use. The source is included with the latest Maatkit release. (Older Maatkit versions won’t be able to take full advantage of it, by the way, but they can still be sped up somewhat). However, I would really appreciate some review from more experienced coders. I’m no C++ wizard. In fact, my first attempts at writing this thing were so blockheaded and wrong, I was almost embarrassed. (Thanks are due to the fine people hanging out on #mysql-dev).

mk-table-sync

After my week-long coding marathon on this in December, I’ve needed to continue working on this. I’ve needed it quite a few times to solve problems with replication. (Did I mention relay log corruption?). It’s much faster and less buggy now, and as a bonus, the latest release can also take advantage of the FNV UDF I just mentioned.

I think I should explain the general evolution in this tool’s life. It started out as “how to find differences in data efficiently.” This was a period where I did a lot of deep thinking on exploiting the structures inherent in data. It then progressed to “how to sync data efficiently.” At this point I was able to outperform another data-syncing tool by a wide margin, even though it was a multi-threaded C++ program and mine was just a Perl script. I did that by writing efficient queries and moving very little data across the network.

The most recent incarnation has thrown performance out the window, at least as measured by those criteria. The aforementioned C++ program now outperforms mine by a wide margin on the same tests.

What changed?

Two things: I’m focusing on quality, and I’m focusing on syncing running servers correctly with minimal interruption.

Once I have good-quality, well-tested code, I’ll be able to speed it up. I know this because I’m currently doing some things I know are slower than they could be.

But much more importantly, I’ve changed the whole angle of the tool. I want to be able to synchronize a busy master and slave, without locking tables, automatically ensuring that the data stays consistent and there are no race conditions. I do this with a lot of special tricks, such as syncing tables in small bits, using SELECT FOR UPDATE to lock only the rows I’m syncing, and so on. And I’m actively working to make the tool Do The Right Thing without needing 99 command-line arguments. (I think the latest release does this very well).

Instead of “make the sync use as little network traffic as possible,” I’ve changed the criteria of good-ness to “do it right, do it once, and don’t get in the way.”

As a result, I can sync a table that gets a ton of updates — one of the “hottest” tables in my application — without interfering with my application. Online. Correctly. In one pass. Through replication. Show me another tool that can do that, and I’ll re-run my benchmarks :-)

This doesn’t mean I don’t care about performance. I do, and I’ll bring back the earlier “go easy on the network” sync algorithms at some point. They are very useful when you have a slow network, or your tables aren’t being updated and you just want to sync things fast. I’ll also be able to speed up the “don’t interfere with the application” algorithms.

One interesting thing I did was divide up the functionality so the tool can use many different sync algorithms. I created something like a storage-engine API, except it’s a sync API. It’s really easy to add in new sync algorithms now. All I have to do is write the code that algorithm needs. This is really only about 200-300 lines of code for the current algorithms.

Tools that don’t yet exist

What I haven’t told you about is a lot of unreleased code and new tools. There’s some good stuff in the works. Also stay tuned — a third party might be about to contribute another tool to Maatkit, which will also be a very neat addition.

Conclusion

As Dana Carvey says, “If I had more time… the programs we have in place are getting the job done, so let’s stay on course, a thousand points of light. Well, unfortunately, I guess my time is up.” Maatkit is getting better all the time, just wait and see.

Technorati Tags:, , , , , , , , , ,

You might also like:

  1. How to sync tables in master-master MySQL replication
  2. Maatkit version 1417 released
  3. Progress on Maatkit bounty, part 3
  4. Maatkit version 1674 released
  5. Maatkit bounty begins tomorrow

MySQL Heartbeat supports PostgreSQL

Thanks to a patch Dane Miller submitted, MySQL Heartbeat can now be used to monitor Slony-I replication lag for PostgreSQL. It works identically: it updates a record on the master and looks for it on the slave. You just have to specify the database driver on the command-line:

mysql-heartbeat [options] --dbidriver=Pg

Most of the tools in the MySQL Toolkit are very MySQL-specific. This one just happened to be an exception, and I’m happy it’s useful for more than the original purpose.

Technorati Tags:, , , , ,

You might also like:

  1. MySQL Toolkit version 896 released
  2. Going to PostgreSQL Conference East

How to implement a queue in SQL

This article explains how to create a fixed-size FIFO (first-in, first-out) queue in SQL, where rows added after a threshold will cause the oldest row to be deleted. There are several ways to do this, but MERGE on Oracle and DB2, and MySQL’s non-standard extensions to SQL, make an elegant solution easy.

Update a PostgreSQL blogger pointed out the obvious method I missed: triggers! There’s also a really neat PG-specific feature that allows it to work even more elegantly on that platform. Well worth a read. I sometimes wish I worked at a PostgreSQL shop so I could have time to learn as much about it as I’ve learned about MySQL.

Method one: do it with a single query in MySQL

Since I’m most familiar with MySQL, I’ll explain it in detail for MySQL. I’m sure a competent Oracle or DB2 developer can translate it to those platforms.

It is not possible to simultaneously INSERT and DELETE in standard SQL. However, in MySQL it is possible to simultaneously INSERT and UPDATE, with the ON DUPLICATE KEY UPDATE syntax. Another way to do it is with REPLACE, which actually works as a DELETE and INSERT. I’ve written about these before (flexible INSERT and UPDATE statements in MySQL — one of my most popular articles). You may not like them because they’re non-standard, but they’re available. I believe in using what my tools give me.

For these queries to work, you need to let inserts proceed as normal until the limit (say, 5) is reached. After that, new inserts need to create a unique index violation, and then the two-things-at-once functionality of the statement kicks in.

There are probably many ways to do this, but for this article, I’m going to imagine the table as a fixed-size, fixed-order queue. Once the list is full, new inserts “wrap around” and begin again at the bottom, travelling up through the rows one at a time. Each new insert then over-writes an existing row until it reaches the top and wraps around again.

For simplicity’s sake, I’m also going to imagine that nothing ever deletes any rows from this table. That way there’ll be no gaps in the sequence I’ll use to make the wrap-around work.

Here’s the table definition:

CREATE TABLE q (
   id int NOT NULL,
   modulo int NOT NULL,
   fruit varchar(10) NOT NULL,
   PRIMARY KEY(id),
   UNIQUE KEY(modulo)
)

This table has two unique keys: one to serve as a monotonically increasing “row number,” and one to cause the wrap-around effect to work. The only real data is the fruit column. Here’s a query to insert “apples” into the queue:

insert into q(id, modulo, fruit)
   select
      (coalesce(max(id), -1) + 1),
      (coalesce(max(id), -1) + 1) mod 5,
      'apples'
   from q
      on duplicate key update
         id    = values(id),
         fruit = values(fruit)

Here’s what the query does: it finds the maximum value of id in the table, which ought to be efficient since it’s indexed. If there are no rows, the result will be NULL, which COALESCE() converts into -1. Then it adds one to that value, which will become the next largest value in the id sequence. What I’m really doing here is rolling my own AUTO_INCREMENT, with a slight twist: the sequence starts at zero, not one.

The sequence needs to start at zero to make the modulo arithmetic easy to understand and work with. At the same time I’m inserting that value into id, I’m also dividing it by the desired size of the table, and inserting the remainder into the modulo column. When the table gets “full,” that column will already contain the calculated value, and there’ll be a unique index violation. Then the ON DUPLICATE KEY UPDATE clause will fire and update the existing row instead of inserting a new one.

Here’s what the table contains after the above insert:

select * from q;
+----+--------+--------+
| id | modulo | fruit  |
+----+--------+--------+
|  0 |      0 | apples |
+----+--------+--------+

Let me now insert four more rows for oranges, peaches, cherries and pears, so the queue is full:

insert into q(id, modulo, fruit)
   select
      (coalesce(max(id), -1) + 1),
      (coalesce(max(id), -1) + 1) mod 5,
      'oranges'
   from q
      on duplicate key update
         id    = values(id),
         fruit = values(fruit);

insert into q(id, modulo, fruit)
   select
      (coalesce(max(id), -1) + 1),
      (coalesce(max(id), -1) + 1) mod 5,
      'peaches'
   from q
      on duplicate key update
         id    = values(id),
         fruit = values(fruit);

insert into q(id, modulo, fruit)
   select
      (coalesce(max(id), -1) + 1),
      (coalesce(max(id), -1) + 1) mod 5,
      'cherries'
   from q
      on duplicate key update
         id    = values(id),
         fruit = values(fruit);

insert into q(id, modulo, fruit)
   select
      (coalesce(max(id), -1) + 1),
      (coalesce(max(id), -1) + 1) mod 5,
      'pears'
   from q
      on duplicate key update
         id    = values(id),
         fruit = values(fruit);

Each row I inserted caused MySQL to print the following back to the command prompt:

Query OK, 1 row affected (0.05 sec)
Records: 1  Duplicates: 0  Warnings: 0

And now, the contents of the table:

select * from q;
+----+--------+----------+
| id | modulo | fruit    |
+----+--------+----------+
|  0 |      0 | apples   |
|  1 |      1 | oranges  |
|  2 |      2 | peaches  |
|  3 |      3 | cherries |
|  4 |      4 | pears    |
+----+--------+----------+

Now I’ll insert another row for bananas:

insert into q(id, modulo, fruit)
   select
      (coalesce(max(id), -1) + 1),
      (coalesce(max(id), -1) + 1) mod 5,
      'bananas'
   from q
      on duplicate key update
         id    = values(id),
         fruit = values(fruit);

That query should have wrapped around to the beginning of the queue and triggered the unique index violation. As a result, MySQL should have overwritten the ‘apples’ row. In fact, the messages at the command prompt indicate something did happen:

Query OK, 2 rows affected (0.03 sec)
Records: 1  Duplicates: 1  Warnings: 0

Two rows were “affected” because of the duplicate key (you can read the MySQL manual for more on what “rows affected” really means). And there was indeed a duplicate row. Here’s what’s in the table now:

select * from q order by modulo;
+----+--------+----------+
| id | modulo | fruit    |
+----+--------+----------+
|  5 |      0 | bananas  |
|  1 |      1 | oranges  |
|  2 |      2 | peaches  |
|  3 |      3 | cherries |
|  4 |      4 | pears    |
+----+--------+----------+

Notice I ordered that query by modulo to show the entries in the same order as before. The “oldest” row, which is at the “front” of the queue, is now the one with the smallest value of id, so to see them “in queue order,” you can order by id:

select * from q order by id;
+----+--------+----------+
| id | modulo | fruit    |
+----+--------+----------+
|  1 |      1 | oranges  |
|  2 |      2 | peaches  |
|  3 |      3 | cherries |
|  4 |      4 | pears    |
|  5 |      0 | bananas  |
+----+--------+----------+

Method two: use REPLACE on MySQL

If it’s easier to write your query this way, or you need support on older versions of MySQL, you can use REPLACE instead of INSERT... ON DUPLICATE KEY UPDATE. Here’s an example:

replace into q(id, modulo, fruit)
   select
      (coalesce(max(id), -1) + 1),
      (coalesce(max(id), -1) + 1) mod 5,
      'bananas'
   from q;

The query may be more or less efficient, depending on your MySQL version, the storage engine you chose, and so forth. If I were doing this in production, I’d test it, probably with my MySQL Query Profiler tool. Hint, hint!

Methods three and four: on other platforms

Another option, which will allow the same easy single-query solution, is to use MERGE on Oracle and DB2. MERGE, REPLACE and friends are what we database folks call upsert queries, because they insert or update. If this functionality is available on other platforms, let me know. It looks like it’s still on the TODO list for PostgreSQL, and I’m fairly certain it’s not in SQL Server 2005. Perhaps a future release of one of these products will offer it.

Till then, I think the best option on these platforms would be a transaction with a couple statements to check the table and either insert or update (or delete and then insert, depending on how you want to do it). This fourth method can be made completely portable across platforms, if that’s important for your use case.

Things to think about

If you implement a system like this, consider the edge cases. Are you ever going to delete rows from the queue? If so, does that mess with the desired behavior of new inserts? Are there any ways you can get a hole in the sequence? If so, what happens — do you get too few rows in the queue, overwrite something other than the oldest row, or something else? If you need to “process” items in the queue, maybe you can just mark them as processed rather than deleting them.

What if you want to insert multiple rows at once? If you need to go that route on MySQL, my past articles might help. You could use advanced user variable techniques to number several rows at once. I also talked about related techniques in how to write INSERT IF NOT EXISTS queries.

Conclusion

In this article I showed you several ways to let a table grow to a fixed size, after which new rows replace old rows. I haven’t personally used this in production; this article grew out of a reader’s question (thanks for the stimulating topic!). If this article helped you, you should consider subscribing for future updates via email or feeds. It’s free and convenient.

Technorati Tags:, , , , ,

You might also like:

  1. SQL Server 2000 date and time puzzler
  2. How to avoid unique index violations on updates in MySQL
  3. How to write a UNION in SQL without using UNION
  4. How to notify event listeners in MySQL
  5. How to delete duplicate rows with SQL, Part 2