I wrote a User-Defined Function that implements the FNV (Fowler-Voll-No) hash function for MySQL. I’m not the first person to do this—in fact, I was inspired by the Google patches for MySQL. But my implementation is a little bit different from most, in a very important way that leads directly to much higher performance, especially suited for the Maatkit tools.
A bit of background: FNV hashing is a very fast hash algorithm that operates in fixed memory. It is widely used in lots of important areas in computer science. My implementation requires absolutely no malloc()
calls, which is a darn good thing because I am not to be trusted with malloc()
, having spent too many years programming in managed languages. I made it return a 64-bit integer, which matches the size MySQL uses internally for most integer arithmetic.
The most important thing I did was make my UDF accept 1 to infinity arguments. That means you can hash entire rows with a single function call. And that is very useful for the Maatkit table-checksumming tools, which tend to run about 8-10 times faster when they don’t have to make MySQL do a bunch of string concatenation. That translates directly to less impact on the server, and less replica lag (if that is a problem for you).
Here’s how my implementation works:
SELECT FNV_64(col1, col2, col3, .... colN) FROM ...
Compare this to MD5() hashing that accomplishes the same thing:
SELECT MD5(CONCAT_WS('#', col1, col2, col3, .... colN)) FROM ...
The UDF’s code is distributed with Maatkit, and I plan to eventually build it as a binary that can be installed without requiring you to compile it. However, compiling is very easy; there are instructions in the source code comments. Installing is also easy: just a simple SQL statement.
If you’re using Maatkit to make sure your replicas have the same data as their master, you should install the UDF on all your servers for a significant performance boost. You’ll save your servers a lot of work. You don’t need to do anything extra for Maatkit to take advantage of it. Maatkit will auto-detect it if you have it installed.
I’ve been running it in production for a couple of months now with nothing but good results. And the code is drop-dead simple, so I think the chance of bugs is very slim. But if you have questions, problems, bug reports etc, please use the Maatkit project page to report them.