Xaprb

Stay curious!

A very fast FNV hash function for MySQL

with 6 comments

I wrote a User-Defined Function that implements the FNV (Fowler-Voll-No) hash function for MySQL. I’m not the first person to do this — in fact, I was inspired by the Google patches for MySQL. But my implementation is a little bit different from most, in a very important way that leads directly to much higher performance, especially suited for the Maatkit tools.

A bit of background: FNV hashing is a very fast hash algorithm that operates in fixed memory. It is widely used in lots of important areas in computer science. My implementation requires absolutely no malloc() calls, which is a darn good thing because I am not to be trusted with malloc(), having spent too many years programming in managed languages. I made it return a 64-bit integer, which matches the size MySQL uses internally for most integer arithmetic.

The most important thing I did was make my UDF accept 1 to infinity arguments. That means you can hash entire rows with a single function call. And that is very useful for the Maatkit table-checksumming tools, which tend to run about 8-10 times faster when they don’t have to make MySQL do a bunch of string concatenation. That translates directly to less impact on the server, and less slave lag (if that is a problem for you).

Here’s how my implementation works:

SELECT FNV_64(col1, col2, col3, .... colN) FROM ...

Compare this to MD5() hashing that accomplishes the same thing:

SELECT MD5(CONCAT_WS('#', col1, col2, col3, .... colN)) FROM ...

The UDF’s code is distributed with Maatkit, and I plan to eventually build it as a binary that can be installed without requiring you to compile it. However, compiling is very easy; there are instructions in the source code comments. Installing is also easy: just a simple SQL statement.

If you’re using Maatkit to make sure your slaves have the same data as their master, you should install the UDF on all your servers for a significant performance boost. You’ll save your servers a lot of work. You don’t need to do anything extra for Maatkit to take advantage of it. Maatkit will auto-detect it if you have it installed.

I’ve been running it in production for a couple of months now with nothing but good results. And the code is drop-dead simple, so I think the chance of bugs is very slim. But if you have questions, problems, bug reports etc, please use the Maatkit project page to report them.

Written by Xaprb

March 9th, 2008 at 6:40 pm

6 Responses to 'A very fast FNV hash function for MySQL'

Subscribe to comments with RSS or TrackBack to 'A very fast FNV hash function for MySQL'.

  1. Hi! Great work. I’ve been using FNV hash in several applications in the recent past (e.g. Ketama consistent hashing lib for Memcache).
    Last week I found an ever faster hash algorithm, called MurmurHash. It may be worth to check it out: http://tanjent.livejournal.com/756623.html

    Robin

    10 Mar 08 at 2:17 am

  2. Have you submitted a feature request to get this function added to MySQL? md5() is slower and produces a much larger result. crc32() isn’t a good hash function. People are now forced to add your UDF or apply th e Google patch to get this and most will do neither.

    Mark Callaghan

    10 Mar 08 at 1:01 pm

  3. Robin, thanks for the tip.

    Xaprb

    10 Mar 08 at 1:28 pm

  4. Mark, thanks for the suggestion. I just submitted it: http://bugs.mysql.com/35188

    My overall feature request is basically “Integrate the Google patches already!” :-)

    Xaprb

    10 Mar 08 at 1:35 pm

  5. making md5() and crc32() allow multiple arguments would be a good feature request, too.

    jim

    10 Mar 08 at 1:50 pm

  6. A little off-topic, but something I’d like to see is more compact encodings of hash values - your FNV can scrape through as it can fit into a 64-bit int, but when using a longer 128- or 160-bit hash, it would be nice to produce base-62 (DNS-safe) or base-64 strings instead of the default inefficient [0-9a-f] hex encoding.

    Marcus

    26 Mar 08 at 11:33 am

Leave a Reply