Xaprb

Stay curious!

How to coordinate distributed work with MySQL’s GET_LOCK

with 8 comments

This article explains how I replaced file-based methods to ensure only one running instance of a program with MySQL’s GET_LOCK function. The result is mutual exclusivity that works in a distributed environment, and it’s dead simple to implement.

My current employer used to use a technique similar to the classic Perl ‘Highlander’ method to ensure only one instance of a certain program would run at any given time. The method was to create a file with a certain name and then get an exclusive, non-blocking lock on the file with the flock() call. If another program tried to get the same lock, it would find the file already locked and exit with a message about “another running instance found.”

This type of locking can be pretty tricky to get right. There are always lots of edge cases to consider. For example, what if the same program tries to get the lock twice and blocks itself?

Trying to roll your own locking functionality is asking for trouble, unless you really know what you’re doing. It’s not enough to just create some “sentinel” that indicates “something else is running.” It must be done atomically and in a non-blocking fashion — already tough to hand-roll — and there are lots of other requirements, such as “the lock must be released if the program dies without having a chance to release it.” This is all old news to readers who’ve worked with threading or other concurrency issues in programs, of course.

The file-based solution works well on a single machine, but it doesn’t work well if you need programs on different machines to play nicely together. Using a network filesystem such as NFS doesn’t solve that problem either. It may work on certain systems, but it’s not portable.

We had this problem at my employer. We were beginning to distribute parts of the workload out to cloned servers, each of whom would do part of the work based on which machine and what time of day, and we needed to make sure they didn’t stomp on each other if they miscalculated what work they should be doing. Several options came to mind, all of them using some functionality already provided for us. For instance, we could choose a server as the arbitrator and write a daemon on it to accept network connections. Holding a socket open to that daemon would represent a lock. If the daemon already had a connection from a certain program, it would deny the connection to other programs. If the program with the lock died, the lock server’s operating system would know, the daemon would know, and it could make the socket open to another instance.

I was just about to take this approach when I stumbled upon MySQL’s GET_LOCK function, completely by accident. I immediately realized we could use it. The syntax is GET_LOCK(str, timeout), and the behavior is to attempt for timeout time to get a “lock,” returning a value indicating whether it succeeded. The “lock” is exclusive, and setting timeout to zero makes the call non-blocking, which is perfect.

I’m putting “lock” in quotes because it’s not really a normal database lock, in the way you might be used to thinking of it. It’s not a lock on a row, or a lock on a table, or anything like that. It’s a lock on an arbitrary string value, and like flock(), it’s completely advisory; nothing forces a program to wait for the lock — it is up to the programs to play nicely together.

The actual SQL call is select COALESCE(GET_LOCK('some_string_value', 0), 0). The call returns 1 if the lock was granted, and 0 or NULL otherwise (hence the COALESCE()). The lock is released either explicitly, or when the connection closes. There are more subtleties to the function’s behavior, but for this purpose, this is all we need to worry about (you can read the manual to learn more about the subtleties).

We did raise the question “what if the DB server is down?” but quickly answered it: every program that does anything important talks to the DB server, so it’s a moot point. Besides, if it’s down, we have bigger problems than two programs processing the same data twice.

Unfortunately, as far as I know no other database vendor has provided something similar to this incredibly handy function.

Are there other ways to implement this easily? I can’t imagine an easier way — even the flock() solution was harder to get right than this — but if you have other ideas, let me know. I’m especially interested in how you’d do this without a DB server.

Written by Xaprb

July 26th, 2006 at 10:32 pm

Posted in Uncategorized

8 Responses to 'How to coordinate distributed work with MySQL’s GET_LOCK'

Subscribe to comments with RSS

  1. Yes, this lock is perfect for a project I’m working on too. I use it to protect a critical section – an algorithm that operates on one table. A stored procedure would use the lock in the same way.

    PostgreSQL could achieve the same effect (for my purposes) using LOCK TABLE.

    Toby

    27 Jul 06 at 12:43 pm

  2. Hi!

    Locking of any sort, IMHO, causes scalability and concurrency issues, even if it is a non-blocking read. Wouldn’t a more scalable solution be to simply not set locks at all and resolve conficts at runtime based on a modified date? This is the standard model of distributed version control.

    Thoughts?

    -Jay

    @Toby: I believe that LOCK TABLE would cause a blocking read in PostgreSQL, no? Just like in MySQL, if you issue a LOCK TABLES … WRITE or SELECT … FOR UPDATE, it will be a blocking read AFAIK…

    Jay Pipes

    27 Jul 06 at 3:53 pm

  3. Sure, for work that’s done in the database locks are a bad thing. In my case, the applications are working outside the database, and we just use MySQL as a central point of control. Remember GET_LOCK() doesn’t actually get a lock on anything, per se.

    The types of applications we’re trying to make mutually exclusive are such things as a script to upload a set of ads to a search engine. Allowing more than one to proceed, and then trying to resolve conflicts later, will most certainly not work :-) The work must be done by one and only one program. Another example is rolling up the tables I referred to a couple articles ago. We don’t want that happening twice at once, as it’s nothing but double work to produce the same answer.

    So we actually use these “locks” to just make sure only one thing can run (hence the “highlander” name — “there can be only one!!!!” — and no, I’ve not seen the movie)

    Xaprb

    27 Jul 06 at 4:09 pm

  4. Also, If the session is interactive, you also have to worry about any interactive_timeout or wait_timeout sessions. Otherwiseif the session disconnects, the GET_LOCK will release the lock immediately.

    Partha Dutta

    28 Jul 06 at 11:52 am

  5. I’m having a hard time to use “GET_LOCK()” function.
    Can comeone explain how this work and put some sample code.

    I’m using InnoDb storage even. But when I apply the function it gives nothing.

    please help.
    Thanks!

    Shehan

    13 Apr 08 at 11:16 pm

  6. @Shehan: I believe get_lock() has nothing to do with what storage engine you’re using. It does no table locking and does not prevent any other db session from running queries on any table. The only thing that happens is if one process runs get_lock(‘name’), and a second process runs get_lock(‘name’) before the original process has called release_lock(), the second process will hang until the original lock is released.

    Sathya

    22 Aug 08 at 1:49 pm

  7. [...] How to coordinate distributed work with MySQL’s GET_LOCK at Xaprb (tags: mysql lock function) [...]

  8. For the archives: Postgres has had LOCK .. NOWAIT from 2005 on.

    http://www.postgresql.org/docs/8.0/static/release-8-0.html

    Jason Dusek

    2 May 12 at 3:28 pm

Leave a Reply