Xaprb

Stay curious!

Archive for the ‘High Performance MySQL’ tag

Progress on High Performance MySQL Backup and Recovery chapter

with one comment

I wrote a couple weeks ago about my work on the Backup and Recovery chapter for High Performance MySQL, 2nd Edition. Thanks for your comments and suggestions, and thanks to those of you who helped me over email as well.

I’ve had several questions about what is included in the chapter, so I thought I’d post the outline as it stands now:

[Introduction]
It's All About Recovery
Topics We Won't Cover
Why Backups?
Considerations and Tradeoffs
  What Can You Afford to Lose?
  Online or Offline Backups?
  Dump or Raw Backup?
  What to Back Up
    Incremental Backups
  Storage Engines and Consistency
    Data Consistency
    File Consistency
  Replication
Backing Up Data
  Dumping Data from MySQL
    SQL dumps
    Delimited File Dumps
    Parallel Dump and Restore
  Filesystem Snapshots
    How LVM Snapshots Work
    Prerequisites and Configuration
    Creating, Mounting and Removing an LVM Snapshot
    Warm Backups with LVM Snapshots
    Hot InnoDB Backups with LVM Snapshots
  Copying Files Across the Network
Restoring from a Backup
  Restoring from Raw Files
    Starting MySQL After Restoring Raw Files
  Restoring from Dumps
    Loading SQL Dumps
    Loading Delimited Dumps
  Point-In-Time Recovery
  More Advanced Recovery Techniques
    Delayed Replication for Fast Recovery
    Filtering Through Replication
  InnoDB Recovery
Backup and Recovery Speed
Backup Tools
  mysqldump
  mysqlhotcopy
  InnoDB Hot Backup
  mylvmbackup
  Zmanda Recovery Manager
    Installing and Testing ZRM
  Comparison of Backup Tools
Scripting Backups

Whew! Even with such a detailed outline, it’s hard to tell how much material is in there (it could be all headings and no text, right?). To give you a rough idea, it’s 32 pages in OpenOffice.org. In fact, I’d say the places that are the least in-depth are “Why Backups?” and the last two sections. As I wrote, I became conscious that a lot of these topics are not specific to MySQL, and there are other books specifically about backup that you should read. My focus for this book, I decided, should be on High Performance MySQL Backup and Recovery.

That’s why I went into such significant detail. For example, the section on copying files across the network is not fluff. It’s benchmarks of file copy methods. And in the section on loading SQL dumps, I show you how to use sed to extract the CREATE TABLE statement for one table out of a huge all-tables dump without decompressing the file and opening it with a text editor (just in case you were silly enough to dump everything into one monolithic file). At present I’d say this chapter has at least four or five times more material than its counterpart in the first edition.

A side effect of working on this chapter is that it motivated me to finish the work I had half-done on parallel dumps (see my most recent few posts for more on this). All good stuff.

I’ll post “further updates as events warrant.”

Written by Xaprb

October 2nd, 2007 at 9:37 am

High Performance MySQL, Second Edition: Backup and Recovery

with 11 comments

Progress on High Performance MySQL, Second Edition is coming along nicely. You have probably noticed the lack of epic multi-part articles on this blog lately — that’s because I’m spending most of my spare time on the book. At this point, we have significant work done on some of the hardest chapters, like Schema Optimization and Query Optimization. I’ve been deep in the guts of those hard optimization chapters for a while now, so I decided to venture into lighter territory: Backup and Recovery, which is one of the few chapters we planned to “revise and expand” from the first edition, rather than completely writing from scratch.

Since we decided to take that approach, I began by following the outline from the first edition, and figured I’d re-read the first edition’s chapter and re-outline, then add more material as appropriate. To my surprise, I found this chapter in the first edition is one of the most cursory (I don’t mean to criticize too much — you’ll see where I’m going with this in a second). It’s quite short and doesn’t really discuss recovery at all, despite the chapter title. There’s one sub-section titled “Recovery,” but it’s only a few paragraphs, and mostly discusses dumping, not recovery! [Edit: whoops, I see each subsection in the "Tools and Techniques" has a few words about how to restore backups created with that specific tool. But there's still not much general advice about how to restore backups.]

The chapter devotes a lot of space to code listings and such, and not enough on how to do high-performance backups in a high-performance application, in my opinion. I quickly decided it needs to be significantly expanded, not just updated, and I scrapped the original text and became more liberal with the outline. I’m referring to the first edition as I write, but I’m not keeping any of the text. Chalk it up to perfectionism.

The outline, as I have it so far, is as follows. If you compare it to the first edition, you’ll see I’ve rearranged it quite a bit:

1  Why Backups?
   (very brief, even more so than the first edition)
2 Considerations and Tradeoffs
   2.1 How Much Can You Afford to Lose?
   2.2 Online or Offline?
   2.3 Dump or Raw Backup?
   2.4 Onsite or Offsite?
   2.5 What to Back Up
   2.6 Storage Engines and Consistency
   2.7 Replication
3 Restoring from a Backup
   3.1 Copying Files Across the Network
   3.2 Starting MySQL
   3.3 Point-In-Time Recovery
4 Tools and Techniques
   4.1 mysqldump
   4.2 mysqlhotcopy
   4.3 Zmanda Recovery Manager
   4.4 InnoDB Hot Backup
   4.5 Offline Backups
   4.6 Filesystem Snapshots
   4.7 MySQL Global Hot Backup
   4.8 Automating and Scripting Backups
5 Rolling Your Own Backup Script

At this point, I have written sections 1, 2 and 3, which are about 11 pages in OpenOffice.org (compare to 6 pages on paper in the first edition). I’m sure this will only grow as other things occur to me. The outline of section 4 is completely open to change, and section 5 might not even happen; if you can script, you can script. Otherwise, you might want to use one of the tools listed in section 4. All in all, I’d say we’re looking at about 25 to 30 pages, just based on what’s in my head and not yet written down.

Now, to come to my point: what would be helpful to you? Are there any challenges you’d like me to cover, such as how you back up a data warehouse with terabytes of data? (I’ve already done that, in What To Back Up, but feel free to ask anyway.) Are there challenges you have had to solve, which you think would be very helpful to others? This chapter is largely open to suggestion at this point. If you tell me/us what you’d like to see, this is your opportunity to get at least four experts to solve your problems in-depth.

The usual disclaimers apply: no guarantees, this is all open to change, this is top-secret pre-production material anyway and you never saw this web page. What is the first rule of Fight Club, again?

I’m looking forward to your feedback.

Written by Xaprb

September 19th, 2007 at 5:46 pm

Coming soon: High Performance MySQL, Second Edition

with 11 comments

We’ve begun writing the second edition of the now-classic High Performance MySQL. “We” means co-authors Arjen Lentz (formerly of MySQL), Baron Schwartz (that’s me), and Vadim Tkachenko and Peter Zaitzev, both formerly of MySQL’s high-performance team and now partners at Percona, a high-performance MySQL consultancy firm and host of the popular MySQL Performance Blog. Neither of the first edition’s authors (Jeremy Zawodny and Derek Balling) is working on this project, but they’re with us in spirit, I think. O’Reilly is still the publisher, and Andy Oram is still the editor.

Though we’re theoretically revising and updating the first edition, we’re actually starting from scratch and re-writing the book. We’re expanding it from the first edition’s 265 pages to 384, according to the contract, but my unofficial guess is it’ll go well over 400 pages. A lot has changed since Jeremy and Derek wrote the first edition — high performance MySQL is a bigger subject today, with different techniques, tools and technologies, and of course a much more complicated MySQL server. The second edition will remain the definitive reference for building high-performance, scalable systems with MySQL.

We’re early in the process, so it’s hard to know how far into the future we can safely look. Still, just to whet your appetite, here’s the table of contents:

  1. Preface
  2. Back to basics
  3. MySQL Architecture
  4. Finding Bottlenecks: Profiling and Benchmarks
  5. Schema Optimization and indexing
  6. Query Performance Optimization
  7. Advanced SQL Functionality
  8. Optimizing Server Settings
  9. Operating System and Hardware Optimization
  10. Scaling and High Availability
  11. Application Level Optimization
  12. Backup and Recovery
  13. Security
  14. Analyzing Server Status
  15. Tools for High Performance

Stay tuned for more news as the book progresses. The four of us plan to blog as we go.