Xaprb

Stay curious!

Archive for September, 2010

Aspersa gets a user manual

with 3 comments

It doesn’t exist until it is nicely documented. Now Aspersa has documentation. Writing these tools has taught me how powerful and flexible Bash can be. Solving MySQL problems is a lot easier with good tools!

Written by Xaprb

September 22nd, 2010 at 9:55 pm

Posted in Aspersa,Maatkit,SQL

How LOCK TABLES interacts with the MySQL slow query log

with 3 comments

Here’s a little trivia that you might find helpful. Suppose that I have the following in the MySQL slow query log (abbreviated for clarity):

# User@Host: root[root] @ localhost [127.0.0.1]
# Time: 100919 17:58:52
# Query_time: 9.648427  Lock_time: 8.648039
select sleep(1) from t limit 1;

To get this into the slow query log, I set the long_query_time to 0 and opened two sessions. In one session I ran LOCK TABLES t WRITE, and in the other I tried to select from that table. As you can see above, 1) LOCK TABLES contributes to the Lock_time number, and 2) the Query_time is the sum of execution time and lock time.

Now, I’ll set long_query_time = 2 and run the same test. What happens? Nothing shows up in the slow query log, because 3) the time spent waiting for table locks doesn’t count towards the slow query time threshold.

A final note: it was rumored that the LOCK TABLES query itself is somehow a special-case that is never logged to the slow query log. However, this is not true; if long_query_time is set to zero, the LOCK TABLES query will appear in the log.

Written by Xaprb

September 19th, 2010 at 6:08 pm

A review of Relational Database Design and the Optimizers by Lahdenmaki and Leach

with one comment

Relational Database Index Design and the Optimizers

Relational Database Index Design and the Optimizers

Relational Database Index Design and the Optimizers. By Tapio Lahdenmaki and Mike Leach, Wiley 2005. (Here’s a link to the publisher’s site).

I picked this book up on the advice of an Oracle expert, and after one of my colleagues had read it and mentioned it to me. The focus is on how to design indexes that will produce the best performance for various types of queries. It goes into quite a bit of detail on how databases execute specific types of queries, including sort-merge joins and multiple index access, and develops a generic cost model that can be used to produce a quick upper-bound estimate (QUBE) for the execution time of a query. The book focuses on DB2, Oracle, and SQL Server, but applies equally well to MySQL and PostgreSQL. I learned a lot from this book, and will add it to my list of essential books.

There are too many myths and rules of thumb about index design. This book debunks them pretty thoroughly. It walks the reader through the process of understanding what a database does to execute a query, and how much that costs; and then what a database does to execute a data modification, and how much that costs. Given this knowledge, you can answer questions such as “what is the ideal index for each of these two queries?” and “should the queries have separate indexes, or is it better to find a compromise that will be good for both of them?” and even “how much slower will the compromise be for each query?” In many cases, the results are non-obvious, and often don’t agree with the rules of thumb you might have been taught. Generally, the book concludes, we should use indexes much more than we often do, and we should not hold irrational fears about the cost of maintaining indexes.

After reading this book, you’ll understand what makes an index good or bad for a query (a three-star ranking system), what makes a query possible or impossible to index ideally, the quick upper-bound estimate of execution time, the Basic Question, finding the cheapest adequate index, difficult predicates, index slices, and a host of other valuable concepts. In addition, there’s an entire chapter on a method for finding queries that are not well indexed. Some of the methods in this book are things I already had notes to implement in Maatkit tools, but others are new to me. The method of finding promising culprits is something I learned in this book, and I think it’s very valuable for a tool such as mk-query-digest with the Percona enhancements to the slow query log.

There are a few things I’ll point out so it doesn’t seem like an unqualified endorsement. One, the book is not as easy to read as it could be. The editors should have removed 99% of the places where the authors italicized or otherwise emphasized words; there’s a lot of emphasis on relatively unimportant or random words. Barely a sentence is free of italics. Second, the book was written in 2005 and today’s machines have much more memory. (This generally makes the book’s points more valid, not less valid.) Finally, the cost model is based on spinning disks, and the QUBE method needs slightly different parameters to work correctly on solid-state storage, or indeed even many modern SANs. However, that’s not a big deal — just measure your storage system’s performance, plug in the correct random versus sequential access time, and the model is still valid.

Note that although PostgreSQL does not yet support index-only queries, which is a major focus of the book, the various cost models apply equally well. One must simply account for the cost of the table access, and not assume that the index is the only thing that’s touched by the query. In general, you’re going to need to know the internals of your database server to apply this book’s wisdom.

Written by Xaprb

September 19th, 2010 at 5:32 pm