Comments on: Speaking at Oracle Open World http://www.xaprb.com/blog/2011/08/13/speaking-at-oracle-open-world/ Stay curious! Thu, 02 May 2013 12:36:53 +0000 hourly 1 http://wordpress.org/?v=3.5.1 By: Xaprb http://www.xaprb.com/blog/2011/08/13/speaking-at-oracle-open-world/#comment-19576 Xaprb Mon, 15 Aug 2011 12:22:30 +0000 http://www.xaprb.com/blog/?p=2421#comment-19576 Mark, thanks for pointing this out. I felt very silly when I saw your reply. Obviously this is instrumented in 5.5 and newer. I was stuck in the 5.1 and older mentality.

]]>
By: Mark Leith http://www.xaprb.com/blog/2011/08/13/speaking-at-oracle-open-world/#comment-19575 Mark Leith Mon, 15 Aug 2011 11:09:46 +0000 http://www.xaprb.com/blog/?p=2421#comment-19575 In the general web space, I agree that monitoring things like the binlog cache usage is generally not very useful – that’s purely down to the average profile of transactions in the web space (lots of short transactions with small statement sizes). The odd case does come up where lots of large statements are grouped together in single transactions, and it can find be useful to tune then (but in those transactions it is such a small percentage of the time, there are often better things to tune for them as well.. But it all adds up if the transaction structure really needs to be that way).

As for alerting on LOCK_log waits, the instrumentation is available for all of that in performance_schema in 5.5 which a) monitors LOCK_log (mutex) waits down to picosecond resolution and b) monitors all threads, including connected slave IO threads.

Something like this would give the same kind of info (less the full stack of course) as PMP:

SELECT IF(pps.name = ‘thread/sql/one_connection’,
CONCAT(ips.user, ‘@’, ips.host),
REPLACE(name, ‘thread/’, ”)) user,
db,
command,
state,
time,
event_name AS last_wait,
IF(timer_wait IS NULL,
‘Still Waiting’,
timer_wait/1000000) last_wait_usec,
source
FROM performance_schema.events_waits_current
JOIN performance_schema.threads pps USING (thread_id)
LEFT JOIN information_schema.processlist ips ON pps.processlist_id = ips.id;

An example with a normal thread and slave IO thread:

*************************** 9. row ***************************
user: msandbox@localhost
db: NULL
command: Query
state: executing
time: 0
last_wait: wait/io/file/myisam/dfile
last_wait_usec: 19.4230
source: mi_dynrec.c:258
*************************** 10. row ***************************
user: msandbox@localhost:54550
db: NULL
command: Binlog Dump
state: Master has sent all binlog to slave; waiting for binlog to be up
time: 983
last_wait: wait/synch/cond/sql/MYSQL_BIN_LOG::update_cond
last_wait_usec: Still Waiting
source: log.cc:5533

You would just see a pile up on the “wait/synch/mutex/sql/LOG::LOCK_log” event for threads “Still Waiting”.

]]>
By: Xaprb http://www.xaprb.com/blog/2011/08/13/speaking-at-oracle-open-world/#comment-19574 Xaprb Sun, 14 Aug 2011 18:10:14 +0000 http://www.xaprb.com/blog/?p=2421#comment-19574 Mark,

I don’t think there is enough instrumentation to alert on this. The best general-purpose way I can think of is to look at the relay log position on replicas and compare it to the current log and position on the master.

The binlog cache hit rate I’m referring to is computed from SHOW GLOBAL STATUS as Binlog_cache_disk_use/Binlog_cache_use. I’ve never seen a case where that was a useful diagnostic, much less useful to alert on. Binlog_cache_disk_use per second could be something useful to compute manually if troubleshooting points to that as a source of problems, but it’s not something I’d bake into a monitoring tool a priori. I should note that my opinions on this have changed. I built Binlog_cache_disk_use/Binlog_cache_use into innotop three or four years ago. But that was a younger me, who also thought that the MyISAM key cache hit rate was important. I can only plead that I’ve learned better ways by spending time doing things that didn’t really produce results.

]]>
By: Mark Callaghan http://www.xaprb.com/blog/2011/08/13/speaking-at-oracle-open-world/#comment-19573 Mark Callaghan Sat, 13 Aug 2011 20:27:56 +0000 http://www.xaprb.com/blog/?p=2421#comment-19573 I have no idea how the binlog cache hit rate is measured, but there is a serious performance problem when replication slaves read old binlogs from the master. They do so while holding LOCK_log which blocks anything else from writing to the binlog until the read completes — http://bugs.mysql.com/bug.php?id=61545. So I think there should be a useful alert for this topic, but either monitoring is insufficient or we don’t know yet what to look for. Fortunately, this bug is easy to fix.

]]>