Xaprb

Stay curious!

How to debug InnoDB lock waits

with 14 comments

This article shows you how to use a little-known InnoDB feature to find out what is holding the lock for which an InnoDB transaction is waiting. I then show you how to use an undocumented feature to make this even easier with innotop.

Background

One of the most common complaints I’ve heard from DBAs used to other database servers is “I can’t find out who holds the locks that are blocking all these connections and making them time out.” I feel your pain. Before I helped scale my employer’s systems to deal with larger volumes of data, InnoDB lock contention was a serious issue. And as far as I knew, you couldn’t find out who was holding locks. I knew you could see who was waiting for locks to be granted; that’s easy. You just run SHOW INNODB STATUS and look for the following text:

------------
TRANSACTIONS
------------
Trx id counter 0 4874
Purge done for trx's n:o < 0 4869 undo n:o < 0 0
History list length 21
Total number of lock structs in row lock hash table 2
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0 4873, ACTIVE 6 sec, process no 7142, OS thread id 1141152064 starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 368
MySQL thread id 9, query id 173 localhost root Sending data
select * from t1 for update
------- TRX HAS BEEN WAITING 6 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 9 page no 3 n bits 72 index `PRIMARY` of table `test/t1` trx id 0 4873 lock_mode X waiting
...

That’s fine, but who holds the lock? I thought there was no way to find that out.

InnoDB Lock Monitor

Until I learned about the InnoDB Lock Monitor, that is. You enable it by running the following command:

CREATE TABLE innodb_lock_monitor(a int) ENGINE=INNODB;

It’s quite an ugly hack, but it turns out the table name is actually “magical.” It’s a special table name that tells InnoDB to start the lock monitor. You can stop it by dropping the table again.

This little-noticed feature makes InnoDB print out a slightly modified version of what you see with SHOW INNODB STATUS. The “slight modification” is to print out not only the locks the transaction waits for, but also those it holds. For example, here’s the transaction that holds the locks:

---TRANSACTION 0 4872, ACTIVE 32 sec, process no 7142, OS thread id 1141287232
2 lock struct(s), heap size 368
MySQL thread id 8, query id 164 localhost root
TABLE LOCK table `test/t1` trx id 0 4872 lock mode IX
RECORD LOCKS space id 9 page no 3 n bits 72 index `PRIMARY` of table `test/t1` trx id 0 4872 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

Record lock, heap no 2 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 4; hex 80000001; asc     ;; 1: len 6; hex 000000000d35; asc      5;; 2: len 7; hex 800000002d0110; asc     -  ;;

That’s fine, but there are, ah, limitations. As the manual says, InnoDB periodically prints out this text — essentially spewing InnoDB’s guts — to its standard output. This gets redirected to the server error log in any sane installation. Who’s looking there? And it gets printed out at long intervals, which seems to be about every 16 seconds on the machines I use.

Plus, if you’ve looked at the result, you’ll understand this is not something you want to search through manually looking for data. The output can be absolutely huge. What DBA wants to pore over thousands of hex-dumped rows from the table just to answer the question “who holds that lock?”

All in all, this is not very convenient (yep, I know that’s an understatement).

Slightly more convenient

What’s a little more convenient than combing through all that text by hand is writing a program to parse InnoDB’s status output. You don’t have to, though. That’s what I wrote innotop to do. And I’ve just released version 1.5.2, which at long last has the ability to watch a log file as well as connecting to server(s).

Here’s how this works: you start innotop, and press the L key to switch to Lock mode. This replaces the old Lock Wait mode, which was only able to monitor the InnoDB lock waits you see in the normal output of SHOW INNODB STATUS.

This mode shows you something like the following:

_____________________________ InnoDB Locks __________________________
CXN   ID  Type    Waiting  Wait   Active  Mode  DB    Table  Index
file  12  RECORD        1  00:10   00:10  X     test  t1     PRIMARY
file  12  TABLE         0  00:10   00:10  IX    test  t1
file  12  RECORD        1  00:10   00:10  X     test  t1     PRIMARY
file  11  TABLE         0  00:00   00:25  IX    test  t1
file  11  RECORD        0  00:00   00:25  X     test  t1     PRIMARY

That’s helpful! I can see the locks held and waited for in a nice tabular format. It’s pretty easy to see connection 11 is blocking connection 12.

This is still pretty inconvenient, though. To get access to the server’s error log, I have to run innotop on the database server machine itself. Is there a better way?

Even better

There is, in fact, but I discovered it completely by accident. It’s not documented, but the extra information doesn’t just get printed to the server log. It also shows up in SHOW INNODB STATUS! Now that’s a nice surprise. It means innotop can get lock information from a normal connection instead of monitoring a log file.

After discovering this, I immediately added some more features to innotop. There are now hot-keys in L mode to enable and disable the lock monitor. Now you can press L, press the ‘a’ key to start the lock monitor, see what’s blocking the waiting transaction, press ‘o’ to stop the lock monitor, and you’re done.

Best yet

I’m sure you InnoDB administrators already recognize what an improvement this is over the options you previously had (essentially, you didn’t have any). There’s still a long way to go, though. Locks could be in the INFORMATION_SCHEMA or in a SHOW LOCKS command. I won’t speculate on why they aren’t already.

Of course, the upcoming Falcon storage engine already has better features for debugging lock contention than this. But my guess is it’ll be a long time before Falcon has the market share InnoDB has. All things considered, InnoDB is a pretty nice piece of software.

Conclusion

Download innotop

the source

The conclusion to this whole article is: use innotop if you use InnoDB. Heck, use it if you use MySQL at all. It makes a lot of things a lot easier, not just debugging InnoDB lock contention. Feedback is welcome — just use the Sourceforge bug tracker, forums, and mailing lists.

Written by Xaprb

September 18th, 2007 at 5:28 pm

Posted in Uncategorized

Tagged with , , , , ,

14 Responses to 'How to debug InnoDB lock waits'

Subscribe to comments with RSS

  1. Thats awesome Baron, could have used this just yesterday…time to update innotop.

    thanks,

    Keith

    Keith Murphy

    18 Sep 07 at 10:21 pm

  2. Excellent post. This is what i was looking for a long time.
    Regards,
    Pranab
    http://www.goguwahati.com

    Pranab

    18 Sep 08 at 4:39 am

  3. Baron, I just wanted to say this is the greatest tool for MySQL ever made. We struggled for days trying to figure out what was holding locks in our database until we found this tool. 5 minutes after installing it, we figured out the issue. Thanks a ton for your efforts!

    Caine

    29 Apr 10 at 12:35 pm

  4. Excellent, I’m glad it helped.

    Xaprb

    29 Apr 10 at 12:41 pm

  5. Big fan of your work!

    innotop is just an absolutely awesome tool…

    Thanks

    Mark

    17 May 10 at 12:05 am

  6. Sir, you are a hero. Innotop is a great tool.

    GDR!

    31 Aug 10 at 12:53 pm

  7. Thanks :-)

    Xaprb

    31 Aug 10 at 2:20 pm

  8. This was quite useful in resolving a problem I ran into today. A replicated instance, which we use for reporting, had this issue and the customer was complaining about data not being up-to-date on their reports.

    I spotted the problem quickly with Innotop, then checked the MySQL logs to see the offending statement. Once I ran the statement and started the slave, everything syncd up great and was back up and running again.

    Since this is the first full week we have been using a replicated instance for reporting, I figured we need to improve the tolerance of MySQL server to failure. So I changed the innodb_lock_wait_timeout to 500 and slave_transaction_retries to 50. Hopefully there are no unintended consequences to this. If you or anyone else knows of anything I didn’t consider, let me know and I’ll consider other avenues.

    Chris

    4 Sep 10 at 2:57 am

  9. Great article. Thanks.
    Anyway worth to mention: check if the queries causing deadlocks have all where statement’s column indexes on it. It will decrease the effect. It helped for me.

    ixdide

    23 Nov 10 at 11:37 am

  10. Segfault

    [RO] Query List (? for help) localhost, 20:07.003, 183.58 QPS, 23/1/3 con/run/cac thds, 5.0.77-log

    When Load QPS Slow Se/In/Up/De% QCacheHit KCacheHit BpsIn BpsOut
    Total 0.00 183.58 53 74/ 1/13/ 8 0.00% 98.28% 20.31k 61.87k

    Cmd ID State User Host DB Time QuerySegmentation fault

    (v 1.8)

    Marcel

    3 Jan 11 at 7:18 am

  11. Marcel, please don’t use a random blog post as a support forum. Use the innotop mailing list and bug tracker system. (You will find the solution to your problem already there.)

    Xaprb

    3 Jan 11 at 11:45 am

  12. Baron:

    Thanks for a great article and great tool. I spread the love at every turn.

    Is there a follow up coming on how to track down the lock holder and tune/eliminate the issue from that side?

    Boyd

    19 Jan 11 at 6:52 pm

  13. There is now a set of tables in the INFORMATION_SCHEMA that gives much better capabilities for that. I’ll put that onto my to-blog list.

    Xaprb

    20 Jan 11 at 12:37 am

  14. Hey, thank you so much for the great tool. I’m trying to troubleshoot a lock timeout and don’t seem to find the right option there is a continuous activity on the database every second which makes it hard to look through the different locks. When I press ‘L’ I can only see the trx that are waiting but none with waiting=0 so I can’t tell who’s holding the lock. The ‘a’ option asks for a query and if I try one of the queries that are locked it either timesout or goes through with no explanation. Any idea?

    Noha

    22 Mar 13 at 7:46 pm

Leave a Reply