Archive for the ‘mysqlconf07’ tag
In my third day at the MySQL Conference and Expo 2007, I again attended keynotes and sessions, one of which I participated in. This evening I had dinner with a fellow community member and arrived late to the Quiz Show, even though I was supposed to be on one of the teams! I blame it on the restaurant, because they took too long to figure out what I meant when I said “kÃ¶nnen wir einen Hubschrauber essen heute abend?”
Today I attended by a decent margin the best sessions I’ve been at all week. If you don’t think they’re saving the best for last, come to my tutorial and demo on monitoring MySQL and InnoDB with innotop tomorrow and see!
Just two quick notes: I am recording the sessions I attend on my iRiver when possible, and will post the audio for download after I get home. Also, you can click on the headings of each of the talks; I have linked them to the session description.
There were again three keynotes this morning. Eben Moglen delivered a fantastic, thought-provoking speech with which I mostly agreed. I was working on innotop during the others, though I was in the room.
Lunch was… I forgot to write it down. A salad and mixed vegetables, a roll, tomatoes that I had to cut. I don’t know. I was trying to meet some folks in the exhibit hall and it’s all a blur now.
This session was mostly a demo and/or sales pitch by three engineers from NitroSecurity. The technology seems well-done, but as far as I can tell the storage engine is not going to be GPL’ed. Too bad they’re missing a big opportunity. On the plus side, they did write some software to ease integration with the storage engine API, and that’s GPL’ed (probably because it has to be, since I guess it’s going to be linked with MySQL).
This session was amazing! It was standing-room-only. Beat Vontobel started out with the classic “animals” guessing game, except it was a series of questions to figure out which programming language you were thinking of. The demo was live, running on his own server on another continent. You simply select from the questions table, and it asks you one question, which is just a single column of text. You insert your answer into the answers table, which is a single yes/no enum column. Then you select from the questions table again… and there’s a different question now. As you continue, it narrows down the choices and eventually guesses what you’re thinking of.
Behind the scenes, even though all you can see is the questions and answers, is a series of views. No procedures, triggers, or functions.
The idea? SQL is a declarative language, and can — and should — be used as a logic language, much like Prolog or Lisp. And the basic building block of the language is the view, which expresses a predicate. I would have said it’s a functional language, and the SELECT statement is the way to express a predicate; a view is an abstraction over the SELECT. But I’m not going to argue with such eloquence.
(And for those of you who saw me raise my hand to the “do you program in Lisp” question, no, it’s not because I’m an Emacs user. I’m not, I’m a Vim user. I use Lisp for artificial intelligence and expert systems).
As if his demo had not made the point forcefully, Beat then proceeded to show us Prolog and SQL code for “who is a sibling of who,” side by side. The parallel is obvious. It was tremendously impressive. If you weren’t there, download the slides and read them.
This session was a very technical discussion of how isolation, Multi-Versioning Concurrency Control, durability and locking work in solidDB. The engine implements both pessimistic and optimistic locking models on a per-table basis, though the terms “pessimistic” and “optimistic” are somewhat misnomers, as is “locking.” It has more to do with record version numbers than locks, as I understand it. Durability (the D in ACID) can also be set to relaxed or strict, at the session level. These features, and the discussion around them, brought the compromises of speed, concurrency, storage, and durability into stark clarity. I’m impressed by solidDB’s range of choices for the DBA. Of course they are also going to support some notion of compatibility with InnoDB’s behavior.
Interestingly, you can mix pessimistically and optimistically locked tables in a single transaction, and the behavior is not upgraded or downgraded — you get pessimistic behavior on one table and optimistic on the other.
I have not yet learned how solidDB will be licensed.
This session was packed. Robin Schumacher took the first part of it, showing what’s planned for the entire MySQL product line over the next year or so. It was a talk calculated to make the audience spend the next year squirming in anticipation. Oooh, finally I’ll get enhanced replication monitoring, and subqueries will get decently optimized, and…!!!!
Robin is a confident, eloquent speaker. The kind of person whom I imagine promises things that make the developers in the audience cringe slightly. “Replication conflict detection! Next slide.”
He gave a demo of the upcoming online backup of a large table while selecting from the table in another session, but amusingly didn’t seem to notice that the SELECT queries in the other window were failing with syntax errors. (Never mind, though… you can do the demo yourself if you want; he included the code you’ll need on a recent article. Robin, if you’re reading this: if you noticed the statements failing, you are one cool customer to continue without missing a beat!)
He followed this with a quick mention of storage engine partners and then proceeded to pitch MySQL Enterprise and MySQL Workbench. Afterwards he finished up with a quote that went something like this: “Backup is coming. It’s real. It’s working.” ;-) Seriously, I believe him.
Jeffrey Pugh took the microphone at this point. He showed us a timeline of MySQL history and features, where we are today, and again what’s planned. Interestingly, he made a public admission of 5.0 being released before being ready, and said this mistake will not be repeated — but apparently sometime in the last week or so MySQL has decided to skip directly from version 5.1, omitting 5.2 and going right to 6.0. Is this version number inflation? It seems like it. Here are some semi-quotes: “We don’t want bugs in 6.0. We don’t want to repeat 5.0.” So, how about not jumping into it? Give it some breathing room.
Someone asked if MySQL will include other programming languages like a JVM embedded, and Robin and Jeffrey made soothing noises into the microphone. My reaction, in case anyone’s listening is for the love of all that’s good and pure keep those things the heck away from MySQL. Please! It is and should be a RDBMS, or at least tries to be and ought to keep trying to. If you embed these things in it, next thing you know it’ll be like Microsoft SQL Server where you can run a fricking web service from a stored procedure.
And of course, there was a discussion about the perpetual topic: what is the difference between Community and Enterprise? I was surprised to hear Robin and Jeffrey correcting each other on this!
All in all, a lively session. Nothing is boring around here.
I got talking to an engineer, who shall not be named, from a Big Company we all know and love (but which shall not be named) in the hall. Thus I arrived late to the session in which I was supposed to participate. Fortunately, I was not listed first. It was a series of community contributors giving lightning-fast (well, sometimes) talks about our experiences as community members. While I sat listening, something strange happened; I began to think in a different way than I had prepared to speak. Thus when it was my turn, I ignored my slides and spoke extemporaneously. I suppose this is a good thing; one is not supposed to read one’s slides.
Holy cow was this a great session. This was the most riveting thing I’ve seen all week. You could really tell who was into this kind of geekiness, because there weren’t that many people in the room. I even tried to record the question and answer session during the intermission as we all crowded around Timour Katchaouno at the lectern.
This session went deep into how the optimizer really works. Topics included how it is similar to and different from other database systems (most of them actually generate machine code; MySQL does not), what it does and why, and what’s coming in the next versions. And for the first time I really understand why MySQL’s core developers think the output of EXPLAIN is somehow understandable to an ordinary mortal (by the way, I have been planning for a while to reverse translate EXPLAIN into a tree view for the rest of us. I’ll get to it, really).
Timour explained MySQL’s cost-based query optimization, which is built on “units of disk access.” He showed its evolution from pre-5.0 where it was an exhaustive search of all possible execution plans, which is O(n!) and didn’t perform well on more than a handful of tables in in a join. I never had this happen to me, but apparently you could quite easily write queries that would take hours, days, weeks just to generate a query plan — and that’s before you even started to execute! These days you can join up to 62 tables, and the algorithm uses exhaustive left-deep search up until a threshold (currently 7 tables), after which it becomes greedy and can choose a non-optimal plan. At least it’ll terminate, though.
I have good news for the query optimization team, though: my brother has solved the Travelling Salesperson problem, which is N-P complete of course. Obviously left-deep search can be transformed into this; so this problem is solved as well. I’m sure it will only be a matter of time before the patents go through, so who’s the highest bidder for the best query optimizer on the planet? Anyone?
This talk brought up a bunch of questions, which I need to follow up on. I’ll report more in a future article.
What fun! I haven’t been this excited since my days at University, scribbling notes as I struggled to understand my teacher’s thick accent and predilection for thinking of everything in terms of real-time databases, sigmas, and so on.
I went with Martin Friebe for supper at a Thai restaurant. On the way there we got talking about table checksum algorithms to detect when a slave is or isn’t in sync with its master. Martin had some great ideas, which I will implement into MySQL Table Checksum to provide another way for you to guarantee two tables have the same data. This particular method will have lower impact on the servers (no locking) and guarantee a consistent read at exactly the same point in the binlog. It will be very useful in certain circumstances. Thank you Martin for the company and the great conversation!
I stumped the judges and picked up a spare copy of Programming Perl. You can never have enough, eh?
Okay, I didn’t really stump the judges; someone asked a question nobody knew the answer to, and I proposed an answer nobody could refute. Let’s see, what does the NDB option ndb_report_thresh_binlog_epoch_slip mean? Is it really the amount of clock skew NDB will permit between the data nodes?
This is a threshold on the number of epochs to be behind before reporting binlog status. For example, a value of 3 (the default) means that if the difference between which epoch has been received from the storage nodes and which epoch has been applied to the binlog is 3 or more, a status message will be sent to the cluster log.
Nope. But I got the book anyway.
Well, I’m fairly slap-happy at this point with jet lag and lack of sleep, but I still want to make a plug for my innotop session tomorrow at 10:45 in Ballroom C. Even if you don’t use InnoDB, you will find this tool has something to offer you. And my presentation and demo is going to be fun, with gratuitous use of stock images. Come on out.
And by the way, I just spoke to someone from another Large Company We All Know, who asked me to implement a new feature in innotop. As Monty is famous for saying, “Trivial. It’s trivial.” If you want to see it, be there; I’ll have it done in time for the session.
Now if you’ll excuse me, I have to fire up Vim…
In my second day at the MySQL Conference and Expo 2007, I attended keynotes, several sessions, and three BoF (Birds of a Feather) sessions. This article is about these sessions. Again, I’ll focus on the Big Ideas and let you read other people’s blog posts for the small details.
There were three keynotes this morning. Two I won’t comment on, but I want to mention the third because it was mostly about the One Laptop Per Child project. I was glad to hear about it instead of what sounded like it was going to be a Red Hat pitch.
This session introduced the Mondrian component of the Pentaho business intelligence suite. Mondrian connects to a SQL backend and converts the flat SQL view of the data into a navigable hierarchical view. The point is to make OLAP scalable on top of MySQL. As such, it touched on tactics for tuning both MySQL and Mondrian — especially aggregation, caching, and cache control in Mondrian. Also on the agenda were near-real-time OLAP (aka “active data warehousing”), and how to cache and invalidate in that scenario. There’s a high cost for doing this, but there can be great benefits as well.
This session featured Digg’s lead developer and lead DBA discussing how Digg built their systems (as opposed to many other sessions, which tell you how you ought to do things). The major components are
- a cluster of web servers
- a memcached farm caching chunks (not whole pages) of content with write-through and some nimble dancing to handle stale data after losing and regaining a server
- MySQL replication with data partitioning and separation for scale-out, with separation into farms by function (search, data warehousing, atomic data)
There was another debate from audience members about what the words “shard” and “partition” mean. Someone in the audience even told the Digg people the correct definition of the terms, which did not match what the developers were talking about. *sigh*
Interestingly, it seems Digg is in the lucky position of being able to scale with replication for reads extremely well, since their load is about 98% reads. They also only have about 30GB of data. I assumed it would be in the terabytes.
This talk was the most technical I’ve been to so far. Yasufumi Kinoshita dove deep into InnoDB to analyze points of contention in many-CPU machines under various workloads. His results are impressive; before his changes, InnoDB did not scale beyond four or sometimes even two CPUs, and could even perform dramatically worse on more CPUs than on fewer! After he identified the points of contention, scaling looked quite good up to at least 8 CPUs, with no indication of other problems caused as side effects. Though there’s still work to do and apparently much debugging needs to be done, this is hugely important for MySQL and InnoDB. I’m glad there are people who can do this kind of work. I couldn’t begin to; the speaker even wrote certain parts of the fixes in assembly.
This session by Peter Zaitsev focused on learning what to configure in MySQL server, and knowing how to find out if they need to be tuned. Topics included memory allocation, how to fight swapping, and a guided tour of the server status variables.
MySQL’s own Jim and Ann Starkey discussed concurrency control in the still-in-beta Falcon storage engine. They talked about all kinds of database systems, the official standards, and other storage engines, not just Falcon (even PostgreSQL came up). Topics included transaction isolation levels, problems and challenges with those, and how InnoDB’s repeatable read really isn’t. In fact, they are trying to decide what to name that level of transaction isolation. Jim calls it “benchmark mode,” because even though it’s not really standard, it is extremely practical and does very well on benchmarks. It sounds like Falcon will provide a means to emulate InnoDB’s behavior for compatibility if for no other reason.
This talk’s Big Idea was Falcon is both like and unlike other storage engines.
This made me think of Guy Kawasaki’s keynote from this morning. Who knows what people will use and abuse Falcon for? I’m glad MySQL and the Starkeys are doing what they believe is right, even though a lot of people (including me, frankly) don’t really understand what and why they are doing. My impression is that Falcon is so different from what people are used to that most of us do not “get it,” and probably will not for a long time. Someone will, though. And when they do, and learn how to make it sing and dance in ways nothing else can do, it’ll make a lot of people* mad for not seeing it themselves sooner. Especially when it makes someone really successful.
* People in Redmond, I’m guessing.
I dropped in on the end of this session briefly. One community member suggested MySQL should use OpenID for authentication. Bravo! It’s a capital idea. Another suggestion brought up the fact that MySQL uses BitKeeper for source control. I voiced my regret that MySQL, a company that believes in and promotes software freedom, has fallen into the trap of using non-Free software themselves. It’s sad to see them handcuffed in such a way. Who else remembers when the use of BitKeeper burned the Linux kernel developers? I know Richard Stallman does, because he’d been predicting that fiasco for many years by the time it finally happened. To choose non-Free software is to choose to be a victim.
Pentaho’s Matt Casters spoke on how to extract data from many disparate sources and store it in a Pentaho data warehouse, and how to use Pentaho and MySQL 5.1′s advanced features to make OLAP queries fast (there are two Big Ideas because the talk was double-length). The first part of the talk focused a lot on Spoon, a user interface for telling Pentaho what to do with data (not how). Next he spoke on MySQL 5.1′s table partitions, followed by data partitioning across databases or servers. The idea here is to retrieve and process data in parallel for greater speed.
Have you been reading Matt’s blog? Do you remember his understated post on processing a large volume of data in parallel with near-linear scalability? I’ve been eagerly reading his articles for a while and it was great to hear him speak and see him demo these things.
The techniques he showed are great, but may result in CPU bottlenecks on the server that does the processing, because you can easily get enough data from a bunch of servers in parallel to peg the CPU. The next level of parallelization is the Carte server, which runs on remote machines and is basically grid computing for business intelligence. He gave a demo of this, which looks great. (Hmmm, I wonder if I could get seti@home to run BI for me? Yeah….) Matt finished up with a demo and overview of the Pentaho product overall.
Birds of a Feather Sessions
This evening I went to three BoF sessions: the first on DBD::mysql, the next as a fly on the wall at Paul McCullagh’s streaming blob server BoF, and finally to learn more about MySQL Proxy, which I’ve been excited about ever since I read about it a few weeks ago.
Today’s expert session was a wash because the session and the official lunch were in different places, and people couldn’t bring their lunch to the meeting room. It might come together better tomorrow, it might not.
Of course, I’ll still be doing the two official sessions tomorrow and Thursday.
In my first day at the MySQL Conference and Expo 2007, I attended the Scaling and High Availability Architectures tutorial in the morning, and Real-world MySQL Performance Tuning in the afternoon. This is a brief article on each session’s Big Ideas, and a short blurb about the conference overall so far.
I’ll also be involved in at least three sessions at the conference, and I describe them.
If you’re interested in short overviews of the sessions I attend, keep watching for my articles. I will give you each session’s major ideas instead of writing stream-of-thought notes. You can look at the presenter’s slides for more.
The conference overall
This conference is well-organized and friendly. Attire is casual; most people are wearing jeans and t-shirts, or khakis and three-button shirts. I found lunch basic but good — catered food, with tables set up in a grassy area in the beautiful California sunshine; nicely dressed tables. I had a nice salad with vinaigrette and crumbled bleu cheese, penne with a sun-dried tomato sauce, red potato salad, and bread.
Pretty much everyone seems to be here. I don’t want to drop names, so I’ll just leave it at that (though I cannot avoid mentioning that I’m rooming with Alexey Kovyrin, who has just released an update to the MySQL Master-Master Replication tool). It is such a pleasure to meet the people I’ve been emailing with; people from all over the world, who use MySQL for all different kinds of things. I also met some people I’ve met at previous events, and whom I consider friends now. Here’s to all of my friends, new and old!
The downside is I miss my wife and Carbon, our loving Rhodesian Ridgeback dog. But I know he will Guard The House® even while I’m gone.
Scaling and High Availability Architectures
This tutorial featured Jeremy Cole and Eric Bergen of Proven Scaling LLC. You probably know them for their generous help giving people rides and passes to various parts of various events, and for contributing patches for things we all need.
Jeremy did most of the talking. The talk was organized roughly into identifying what scaling and high availability are (they’re not the same thing), what problems typically present at various stages of an application’s lifecycle, and some strategies to use and avoid. It promoted application partitioning for scaling, and master-master replication for high availability. All in all a very good discussion of the pros and cons of many concepts, both big and small.
This tutorial was mostly pretty high-level, but frequently got down to specifics.
One of the things I noticed the most from the audience’s questions was how differently many people understand the concept “partitioning.” There were at least three working definitions I heard, and they are not at all the same thing. I think one of the primary obstacles to teaching the principles this talk covered is conveying accurately what it means to partition. The definitions I heard were
- Dividing data into partitions (aka “shards”) and locating each on one of some number of servers.
- Dividing a large table into smaller tables on the same server.
- Using the partitioned tables available in MySQL 5.1.
I also heard people talking about partitioning by date, which I usually associate with archiving.
In the context of the talk, partitioning data means dividing it by something like user ID and locating each partition on one of some number of servers. This is key to horizontal scaling.
Real-world MySQL Performance Tuning
Ask Hansen’s talk was a broad overview of how to scale web applications, from start to finish. It included not only a lot of advice on MySQL, but also suggestions not related to MySQL, such as application-level caching, proxies, failover, etc. He covered a huge amount of material; his slides are interesting and varied, with nice illustrations. He often gave high-level advice, such as “cache aggressively,” but at least as often devoted entire slides to a low-level topic.
Jay’s talk covered much less ground. He focused on specific performance optimizations for MySQL. The topics included indexes, how to know when and how indexes are used, query plans, and server tuning. The slides showed a lot of code examples and the results of various query strategies and indexing changes.
You can catch me, with Mark Callaghan and Peter Zaitsev, tomorrow at lunchtime at an experts’ session on migrating MySQL from 4.0 to 5.0 (organized by solidDB). On Wednesday I’ll be part of a lightning session Lightning Rounds with Top MySQL Community Contributors.
On Thursday I will be giving a session myself, on how to use the innotop MySQL and InnoDB monitor. I designed this session to show you how to go beyond the surface with innotop; my design strategy with innotop is that you should be able to start it and see something useful immediately without the advanced features even being visible, but there’s a tremendous amount of power lurking in it.
I hope to meet you if you’re here, whether at one of my sessions or just in the hallways. Till then, be well and enjoy the conference!