Archive for the ‘Plugins’ tag
A review of MySQL 5.1 Plugin Development by Golubchik and Hutchings
MySQL 5.1 Plugin Development, by Sergei Golubchik and Andrew Hutchings, Packt 2010. About 250 pages. (Here’s a link to the publisher’s site.)
This book is well worth reading for anyone interested in MySQL internals. I learned a lot from it. It is well-written and understandable. I cannot say that I’m planning to write storage engines or more advanced plugins, but I have a great many ideas how to improve MySQL, and I now understand more clearly which of those are suitable to write as plugins, and of what type of plugin is appropriate. I also think I have a better idea how much work these various ideas might involve.
The book begins with an orientation to building plugins on various platforms. Next it covers user-defined functions (UDFs). I have written UDFs, but that’s as far as I have gone with MySQL plugins. The rest of the book covers Daemon plugins, INFORMATION_SCHEMA plugins, full-text parser plugins, and storage engine plugins from basic to advanced. The last example is a nearly complete storage engine built on Tokyo Cabinet, with some pretty advanced functionality. It finishes with a quick overview of the types of plugins available in development and future versions of MySQL, and what’s possible in MariaDB.
The examples are full code listings, with paragraphs of text alternating with a few lines of code. It’s like reading a really well-commented C program, like reading InnoDB source, but with even more explanations. You can download everything you need to build and run the examples yourself — even the sample images used for demonstrating full-text search of EXIF data.
I enjoyed reading about what’s possible in MariaDB. I had not kept up-to-date with the work that’s being done there. If I were a storage engine developer, I’m sure I would appreciate what MariaDB has done. I would speculate that many of the people who’ve written in-house custom storage engines for their own businesses might find MariaDB interesting.
I think that anyone who is planning to modify the MySQL source code should read this book. It could save a lot of work and show easier ways to do things. I learned a lot about the MySQL source code that I have not gotten from other places. This one will go onto my list of essential books for MySQL users.
MySQL Archiver can now archive each row to a different table
One of the enhancements I added to MySQL Archiver in the recent release was listed innocently in the changelog as “Destination plugins can now rewrite the INSERT statement.” Not very exciting or informative, huh? Keep reading.
If you’ve used plugins with MySQL Archiver you know that I created a series of “hooks” where plugins can take some action: before beginning, before archiving each row, etc etc. This lets plugins do things like create new destination tables, aggregate archived rows to summary tables during archiving (great for building data warehouses, though not as sophisticated as Kettle), and so on. Well, this release added a new hook for plugins: custom_sth.
This lets a plugin override the prepared statement the tool will use to insert rows into the archive. By default the prepared statement just inserts into the destination table. But the custom_sth hook lets the plugin inspect the row that’s about to be archived and decide what to do with it. This lets it do interesting things like archive rows to different tables.
This came up because some of the tables I’m archiving to suddenly hit the bend in the hockey-stick curve. I diagnosed the problem very simply: inserts began taking most of the time during archiving. As you might know, MySQL Archiver has a statistics mode where it profiles every operation and reports the stats at the end. I’m archiving out of InnoDB into MyISAM; take a look at the stats:
Action Count Time Pct inserting 800584 12722.8245 88.35 deleting 800584 1464.1040 10.17 print_file 800584 58.3453 0.41 commit 3204 29.4391 0.20 select 1602 8.5654 0.06 other 0 116.5321 0.81
Inserting suddenly took 88% of the time spent archiving, when it had been taking a very small fraction of the time. I’d been meaning to split the archived data out by date and/or customer, and this convinced me it was time to stop procrastinating. There are columns in the archived rows for both of these dimensions in the data, so it shouldn’t be hard. So I added the custom_sth hook, wrote a 40-line plugin, and did it. Results:
Action Count Time Pct deleting 51675 525.2777 87.62 inserting 51675 49.3903 8.24 print_file 51675 4.4639 0.74 commit 208 2.1553 0.36 custom_sth 51675 1.4575 0.24 select 104 0.6714 0.11 before_insert 51675 0.1135 0.02 before_begin 1 0.0001 0.00 plugin_start 1 0.0000 0.00 after_finish 1 0.0000 0.00 other 0 15.9868 2.67
(You can see the effect of having a plugin, because the time taken for all the hooks is listed in the stats. There was no plugin previously.)
Now inserting takes only 8% of the time required to archive. Put another way, it used to insert 63 rows per second, now it inserts 1046 rows per second. This is single-row inserts. (It is not intended to archive fast; it is intended to archive without disturbing the OLTP processes. Obviously this server can do a lot more inserts and deletes than this.)
What had happened? The MyISAM tables on the destination end had just gotten too big for their indexes to fit in memory, and the inserts had suddenly slowed dramatically. I didn’t want to give them a lot more memory, because I want the memory to be used for the InnoDB data on that machine. This is the same kind of thing, I’d guess, that Kevin Burton just wrote about.
Oh yeah, while I was at it, I totally rewrote the archiver with unit-tested, test-driven, test-first, other-buzzword-compliant code. I added a lot of other improvements, too. For example, it can now archive tables that have much harder keys to optimize efficiently, such as nullable non-unique non-primary keys.






