Archive for the ‘EnterpriseDB’ tag
Day two of the conference was a little disappointing, as far as sessions went. There were several time blocks where I simply wasn’t interested in any of the sessions. Instead, I went to the expo hall and tried to pry straight answers out of sly salespeople. Here’s what I attended.
Paying It Forward: Harnessing the MySQL Contributory Resources
This was a talk focused on how MySQL has made it possible for community members to contribute to MySQL. There was quite a bit of talk about IRC channels, mailing lists, and the like. However, the talk gave short shrift to how MySQL plans to become truly open source (in terms of its development model, not its license). I think there was basically nothing to talk about there. I had a good conversation about some of my concerns with the speaker and some others from MySQL right afterwards.
There was basically nobody there — I didn’t count, but I’d say maybe 10 or 12 people. I think this is a telling sign.
Architecture of Maria: A New Storage Engine with a Transactional Design
I was interested in this talk because I’m interested in the tension between Falcon and Maria (and between Falcon and everything, for that matter) but I left and went to the expo hall again after a bit. The talk was good but I’d already seen and/or read it, and the question-and-answer component wasn’t enough to keep me there.
The MySQL Query Cache
This was the second session I gave at the conference, and again it was standing-room-only, with nearly 300 attendees according to the person who was watching the door. The questions were frequent and added a lot to the discussion. Slides will be on the conference website when they post them.
Grazr: Lessons Learned Building a Web 2.0 Application Using MySQL
I was keenly interested in this talk because a) I am a big fan of Patrick Galbraith’s work with many different projects, and b) I had heard a lot about Grazr but didn’t know much about it. However, I missed most of the talk. About ten minutes into it, I got a call I couldn’t refuse: my wife!
However, I did sneak back into the room for the last bit too. And I gave Grazr a try. Unfortunately, I got really confused by it; I tried a bunch of different ways to import my Google Reader’s OPML. I got that to work, but then I couldn’t figure out how to read the feeds in the OPML via Grazr. Then I think I figured that out (I’m not sure) but it didn’t strike me as a very handy way to read my feeds. I’ll try taking another look at it later if I get time. (I’m all ears if there’s a better way to read feeds).
This one was mostly for fun. I knew a lot about UDFs already (I’ve created some) and I knew about the pluggable storage engine API. But I didn’t know about pluggable event daemons. Holy cow, what a great way to shoot yourself (or your server) in the foot! All the power of an atomic bomb, with all the safety of SPF 5 sunblock in a nuclear attack. Or something like that. But darn, it sure is nifty. Brian is a great speaker too — very lively.
You know, there’s another way to extend MySQL that most people don’t seem to know about, which Brian didn’t mention. That is procedures (not stored procedures). They are sort of like a post-filter for a result set, and like UDFs they’ve been around forever. I have never heard of anyone writing their own, but there’s an example in the server itself: PROCEDURE ANALYSE.
I went to the expo hall to meet and greet many of the companies that Percona (my employer) is already working with (doing independent benchmarks, performance verification, analysis etc) or will be in the future. I also wanted to grill some of the vendors on their technology. Usually I find them very cagey; they claim X times faster this-or-that, but won’t tell you how, and won’t tell you what their systems don’t do well. I don’t understand why they take this approach; you can’t hide your system’s strong and weak spots. There is no security through obscurity, and shrewd independent observers are going to get to the bottom of it with or without your permission.
So, for instance, I was talking with Tokutek, who claimed to be a drop-in replacement for InnoDB with 200x better performance and apparently no downsides. However, on closer questioning, I did get him to admit that the system has table-level locking. Thus it won’t give any concurrency, so saying it’s a drop-in InnoDB replacement is questionable. And the comparison against InnoDB seemed contrived to create a worst-case situation with bad tuning and a workload so it would perform terribly. An honest comparison tunes both systems to their highest performance and measures them; you can’t tune one system as badly as possible and compare it to the other’s best-case performance. I pressed on further and asked about range scans in some specific cases (they claim they’re great at range queries, and equal to InnoDB on everything else). At last they admitted they can’t perform well on some very common queries such as real-life queries InnoDB performs very well on for me. They said these are “point queries” but that’s not true; you can design indexes to support many different ways to range-query a table in InnoDB and get great performance. So it sounds to me like Tokutek’s storage format is extremely narrowly focused, and there is indeed a trade-off. I will be interested to see how their technology develops, though. It’s not done yet.
There are a lot of Maatkit t-shirts walking around, which makes me happy. If I’d printed 200 of them, I probably could have given them all away. I was wearing a PostgreSQL t-shirt myself. Proudly, I might add. I’m not the only person here who’s interested in PostgreSQL. This morning I met a person from EnterpriseDB.
Yesterday was a bit slow in terms of interesting sessions, but there was a lot going on in the hallways, the expo hall, the meetings over lunch, and so on.
As a I wrote a couple of days ago, I went to the second day of PostgreSQL Conference East 2008 last Sunday. I had a good time and really enjoyed meeting everyone, listening, learning, and occasionally talking. I asked a number of fearless-newbie questions that paid off handsomely: people were very willing to humor me. I also left with a beautiful t-shirt, mug, and bag combo thanks to EnterpriseDB. The bag has already been put to use for a grocery shopping trip.
Note to conference/website organizers: I can’t link to anything but the front page, so I assume my link above will someday point to the 2009 conference, or the 2008 West conference. It would be good to give each event a permalink right from the start…
One thing that surprised me was the distance people traveled to attend. I thought this would be an east-coast USA thing, but people came from Portland, Russia, and beyond.
The first event was an open discussion. At the front of the room were Bruce Momjian, Joshua Drake, Magnus Hagander, and Selena Deckelmann. The first question was about the future of Postgres: what are the goals for the 9.0 release? The answers varied, but generally the sense was that in the future Postgres should continue to add more features and not only catch up to, but surpass the “big boys.” Special mention went to recursive queries, windowing functions, point-in-time recovery, and more standards compatibility.
This was followed by a lengthy discussion on user groups, global vs. local, and so on. One interesting quote here is that no one can buy Postgres because there’s literally no one to buy it from.
After that I poked my hand up and asked what you say to people migrating from other RDBMSs, such as MySQL. I received a warm welcome, a statement that Postgres is hands-down superior to MySQL period, and a lot of interesting commentary on the differences in the communities between the two. I have been thinking a lot on the MySQL community and am not yet ready to put my thoughts into words, so I’ll just give an overview of what the panelists said: the communities are quite opposite in many respects, both organizationally and psychologically.
This was followed by a question about how to encourage development of a feature that “people need.” This also went quite deep into the open-source mindset and development methodology, with people pointing out that the Postgres community is a meritocracy and you cannot co-opt it with money. At the same time, what “the community” wants isn’t what goes into the codebase: the itches that get scratched are the hacker itches, not the community itches. Sometimes these are one and the same.
Apparently one of the community itches is in-place upgrades. I gather that an upgrade requires a dump and reload because releases are not capable of reading files written by previous releases. This sounded like a pretty severe problem, yet the “hacker itch” wasn’t there. People said that they frequently get told “that’s already solved: dump and reload.” Not a solution with large data volumes.
The discussion then turned to why more people aren’t capable of meeting their own needs. My personal belief here is that the big corporations are buying the minds of the smart people by infiltrating universities and schools, and we (we the citizens of the USA, not we the hackers) are just standing by and letting it happen as though it’s a good thing for powerful vested interests to be “giving” our schools “free” software and other things that they cannot inspect, hack, and change. The other problem is that universities aren’t teaching data. They’re teaching everything but data, yet that’s the most important part of the technology economy today. Tools are not as important: they exist only to work with data. You’re lucky to find someone who’s been university-educated in any database, much less an open-source or Free Software one.
Most of what I heard from the panel agreed with my personal views, but they didn’t focus on the problem in the university as much as I feel is important. And just as importantly, perhaps I didn’t hear enough recognition as I wished that there’s a real chance to change this: commercial/opensource companies like EnterpriseDB can really pull a long lever here by counter-infiltrating the classroom. Aside from just legislating the proprietary software right out of the classroom — which I think would be a good start — we can subvert them also.
Around this point someone in the room opined that one of the things that’s unique about Postgres is the difficulty of finding a competent DBA, and the expense of hiring them. This person said that it’s easy to find Oracle DBAs, and you can hire a good MySQL DBA a dime-a-dozen for $35,000 USD per year. I kept my mouth shut, but suffice to say this is not my experience at all. I think we’re all in the same boat here, and this is a case of the grass looking greener on the other side.
The great quote I heard in this session was “We take Oracle DBAs and try to break them.” Someone please step up and take credit for that one :-)
SQL/XML for Developers
This talk was by Lewis Cunningham of EnterpriseDB. He introduced people to XML and then showed the functions that have recently (which release?) been added to Postgres for manipulating XML documents and document fragments. There’s also a native XML datatype, which I asked a few questions about. Apparently it is TEXT under the hood, with a well-formedness check in front of it. I asked a little about the storage format, and was told TEXT is stored out-of-line for large values, lz-compressed, and not allocated a page-at-a-time as with MySQL’s InnoDB engine (so it’s not as wasteful — I wanted to get a sense of whether it would be very inefficient to store XML in Postgres from the memory/disk point of view).
I asked about indexing. Since Postgres offers functional indexing (that is, you can index the result of a function — not “its indexing works”), in theory you could index XML documents by indexing the result of an XPath expression, for example. I was looking for the “yes, but” and I got it: there are some planner (query optimizer, for MySQL folks) limitations to this approach.
The great quote from this session was the response to “what would you use instead of Hibernate?” (Hibernate is a Java ORM system). The response was “hand-code it in assembly.” Beautiful.
Big, Bad, Broken, PostgreSQL
This talk was by Robert Treat of OmniTI. He described how a data warehouse turned into a train wreck and how they recovered it. The exact cause of failure is apparently still not known. But it sounded like an interesting, sleepless time. This was a pretty technical discussion. One thing I found interesting was the definition of “large” data warehouse. To my mind, a terabyte or two isn’t exceptionally large. Is that very large in the Postgres world? I’m not trying to be a jerk… just trying to understand. I think one of the reasons it might be large goes back to what people were saying about the need to dump and reload for every upgrade: doing that for a TB of data sounds like a significant barrier to building really large systems.
Monitoring PostgreSQL with ptop
This session was given by Selena Deckelmann. ptop is a top clone that is literally derived from the Unix top utility. It has the ability to monitor current queries as well as looking at the statistics from the operating system itself.
(Tangent: This is an interesting approach, and one which an innotop user has said he’s working on adding to innotop. innotop can monitor many systems at once, but it doesn’t monitor the operating system — it talks only to the MySQL server. This user was talking about opening an SSH connection to each server and looking at /proc/vmstat and /proc/diskstats as well).
Sorry for going off on a tangent. Anyway, ptop is a C app that Selena and one other person maintain. It can show the current processes, list of locks, explain queries, and so on. One interesting limitation is that it can’t monitor a whole server: it’s constrained to a single database. I gather this is because PostgreSQL’s statistics views, which it queries, are per-database.
After the conference ended, a few of us piled into cars and followed Bruce into DC for a tour. We visited the Lincoln Memorial, the Viet Nam Memorial, went through the World War II memorial, and up to the Washington Monument. At this point I split and went back home.
All in all a great time and great people, and I’m sorry I missed the first day. This event is so close to me (3 hours drive) that I will really try to make the entire weekend next time, unless it again conflicts with my wife’s 10-mile race schedule.