Archive for the ‘High Performance MySQL’ tag
I wrote a couple weeks ago about my work on the Backup and Recovery chapter for High Performance MySQL, 2nd Edition. Thanks for your comments and suggestions, and thanks to those of you who helped me over email as well.
I’ve had several questions about what is included in the chapter, so I thought I’d post the outline as it stands now:
[Introduction] It's All About Recovery Topics We Won't Cover Why Backups? Considerations and Tradeoffs What Can You Afford to Lose? Online or Offline Backups? Dump or Raw Backup? What to Back Up Incremental Backups Storage Engines and Consistency Data Consistency File Consistency Replication Backing Up Data Dumping Data from MySQL SQL dumps Delimited File Dumps Parallel Dump and Restore Filesystem Snapshots How LVM Snapshots Work Prerequisites and Configuration Creating, Mounting and Removing an LVM Snapshot Warm Backups with LVM Snapshots Hot InnoDB Backups with LVM Snapshots Copying Files Across the Network Restoring from a Backup Restoring from Raw Files Starting MySQL After Restoring Raw Files Restoring from Dumps Loading SQL Dumps Loading Delimited Dumps Point-In-Time Recovery More Advanced Recovery Techniques Delayed Replication for Fast Recovery Filtering Through Replication InnoDB Recovery Backup and Recovery Speed Backup Tools mysqldump mysqlhotcopy InnoDB Hot Backup mylvmbackup Zmanda Recovery Manager Installing and Testing ZRM Comparison of Backup Tools Scripting Backups
Whew! Even with such a detailed outline, it’s hard to tell how much material is in there (it could be all headings and no text, right?). To give you a rough idea, it’s 32 pages in OpenOffice.org. In fact, I’d say the places that are the least in-depth are “Why Backups?” and the last two sections. As I wrote, I became conscious that a lot of these topics are not specific to MySQL, and there are other books specifically about backup that you should read. My focus for this book, I decided, should be on High Performance MySQL Backup and Recovery.
That’s why I went into such significant detail. For example, the section on copying files across the network is not fluff. It’s benchmarks of file copy methods. And in the section on loading SQL dumps, I show you how to use
sed to extract the CREATE TABLE statement for one table out of a huge all-tables dump without decompressing the file and opening it with a text editor (just in case you were silly enough to dump everything into one monolithic file). At present I’d say this chapter has at least four or five times more material than its counterpart in the first edition.
A side effect of working on this chapter is that it motivated me to finish the work I had half-done on parallel dumps (see my most recent few posts for more on this). All good stuff.
I’ll post “further updates as events warrant.”
Progress on High Performance MySQL, Second Edition is coming along nicely. You have probably noticed the lack of epic multi-part articles on this blog lately — that’s because I’m spending most of my spare time on the book. At this point, we have significant work done on some of the hardest chapters, like Schema Optimization and Query Optimization. I’ve been deep in the guts of those hard optimization chapters for a while now, so I decided to venture into lighter territory: Backup and Recovery, which is one of the few chapters we planned to “revise and expand” from the first edition, rather than completely writing from scratch.
Since we decided to take that approach, I began by following the outline from the first edition, and figured I’d re-read the first edition’s chapter and re-outline, then add more material as appropriate. To my surprise, I found this chapter in the first edition is one of the most cursory (I don’t mean to criticize too much — you’ll see where I’m going with this in a second). It’s quite short and doesn’t really discuss recovery at all, despite the chapter title. There’s one sub-section titled “Recovery,” but it’s only a few paragraphs, and mostly discusses dumping, not recovery! [Edit: whoops, I see each subsection in the "Tools and Techniques" has a few words about how to restore backups created with that specific tool. But there's still not much general advice about how to restore backups.]
The chapter devotes a lot of space to code listings and such, and not enough on how to do high-performance backups in a high-performance application, in my opinion. I quickly decided it needs to be significantly expanded, not just updated, and I scrapped the original text and became more liberal with the outline. I’m referring to the first edition as I write, but I’m not keeping any of the text. Chalk it up to perfectionism.
The outline, as I have it so far, is as follows. If you compare it to the first edition, you’ll see I’ve rearranged it quite a bit:
1 Why Backups? (very brief, even more so than the first edition) 2 Considerations and Tradeoffs 2.1 How Much Can You Afford to Lose? 2.2 Online or Offline? 2.3 Dump or Raw Backup? 2.4 Onsite or Offsite? 2.5 What to Back Up 2.6 Storage Engines and Consistency 2.7 Replication 3 Restoring from a Backup 3.1 Copying Files Across the Network 3.2 Starting MySQL 3.3 Point-In-Time Recovery 4 Tools and Techniques 4.1 mysqldump 4.2 mysqlhotcopy 4.3 Zmanda Recovery Manager 4.4 InnoDB Hot Backup 4.5 Offline Backups 4.6 Filesystem Snapshots 4.7 MySQL Global Hot Backup 4.8 Automating and Scripting Backups 5 Rolling Your Own Backup Script
At this point, I have written sections 1, 2 and 3, which are about 11 pages in OpenOffice.org (compare to 6 pages on paper in the first edition). I’m sure this will only grow as other things occur to me. The outline of section 4 is completely open to change, and section 5 might not even happen; if you can script, you can script. Otherwise, you might want to use one of the tools listed in section 4. All in all, I’d say we’re looking at about 25 to 30 pages, just based on what’s in my head and not yet written down.
Now, to come to my point: what would be helpful to you? Are there any challenges you’d like me to cover, such as how you back up a data warehouse with terabytes of data? (I’ve already done that, in What To Back Up, but feel free to ask anyway.) Are there challenges you have had to solve, which you think would be very helpful to others? This chapter is largely open to suggestion at this point. If you tell me/us what you’d like to see, this is your opportunity to get at least four experts to solve your problems in-depth.
The usual disclaimers apply: no guarantees, this is all open to change, this is top-secret pre-production material anyway and you never saw this web page. What is the first rule of Fight Club, again?
I’m looking forward to your feedback.
We’ve begun writing the second edition of the now-classic High Performance MySQL. “We” means co-authors Arjen Lentz (formerly of MySQL), Baron Schwartz (that’s me), and Vadim Tkachenko and Peter Zaitzev, both formerly of MySQL’s high-performance team and now partners at Percona, a high-performance MySQL consultancy firm and host of the popular MySQL Performance Blog. Neither of the first edition’s authors (Jeremy Zawodny and Derek Balling) is working on this project, but they’re with us in spirit, I think. O’Reilly is still the publisher, and Andy Oram is still the editor.
Though we’re theoretically revising and updating the first edition, we’re actually starting from scratch and re-writing the book. We’re expanding it from the first edition’s 265 pages to 384, according to the contract, but my unofficial guess is it’ll go well over 400 pages. A lot has changed since Jeremy and Derek wrote the first edition — high performance MySQL is a bigger subject today, with different techniques, tools and technologies, and of course a much more complicated MySQL server. The second edition will remain the definitive reference for building high-performance, scalable systems with MySQL.
We’re early in the process, so it’s hard to know how far into the future we can safely look. Still, just to whet your appetite, here’s the table of contents:
- Back to basics
- MySQL Architecture
- Finding Bottlenecks: Profiling and Benchmarks
- Schema Optimization and indexing
- Query Performance Optimization
- Advanced SQL Functionality
- Optimizing Server Settings
- Operating System and Hardware Optimization
- Scaling and High Availability
- Application Level Optimization
- Backup and Recovery
- Analyzing Server Status
- Tools for High Performance
Stay tuned for more news as the book progresses. The four of us plan to blog as we go.