Stay Curious!

A very fast FNV hash function for MySQL

I wrote a User-Defined Function that implements the FNV (Fowler-Voll-No) hash function for MySQL. I’m not the first person to do this – in fact, I was inspired by the Google patches for MySQL. But my implementation is a little bit different from most, in a very important way that leads directly to much higher performance, especially suited for the Maatkit tools.

A bit of background: FNV hashing is a very fast hash algorithm that operates in fixed memory. It is widely used in lots of important areas in computer science. My implementation requires absolutely no malloc() calls, which is a darn good thing because I am not to be trusted with malloc(), having spent too many years programming in managed languages. I made it return a 64-bit integer, which matches the size MySQL uses internally for most integer arithmetic.

The most important thing I did was make my UDF accept 1 to infinity arguments. That means you can hash entire rows with a single function call. And that is very useful for the Maatkit table-checksumming tools, which tend to run about 8-10 times faster when they don’t have to make MySQL do a bunch of string concatenation. That translates directly to less impact on the server, and less slave lag (if that is a problem for you).

Here’s how my implementation works:

SELECT FNV_64(col1, col2, col3, .... colN) FROM ...

Compare this to MD5() hashing that accomplishes the same thing:

SELECT MD5(CONCAT_WS('#', col1, col2, col3, .... colN)) FROM ...

The UDF’s code is distributed with Maatkit, and I plan to eventually build it as a binary that can be installed without requiring you to compile it. However, compiling is very easy; there are instructions in the source code comments. Installing is also easy: just a simple SQL statement.

If you’re using Maatkit to make sure your slaves have the same data as their master, you should install the UDF on all your servers for a significant performance boost. You’ll save your servers a lot of work. You don’t need to do anything extra for Maatkit to take advantage of it. Maatkit will auto-detect it if you have it installed.

I’ve been running it in production for a couple of months now with nothing but good results. And the code is drop-dead simple, so I think the chance of bugs is very slim. But if you have questions, problems, bug reports etc, please use the Maatkit project page to report them.

Posted on Sun, Mar 9, 2008. Approximately 500 Words.

Databases