Xaprb

Stay curious!

Archive for the ‘checksum’ tag

5 ways to make hexadecimal identifiers perform better on MySQL

with 7 comments

One of the most common patterns I see in my consulting work is identifiers that are generated by MD5() or UUID(). Many times this is done in an application framework or something similar — not software the client has written. From the application programmer’s point of view, it’s just an incredibly handy idiom: generate a unique value and use it, you’re done.

Those values tend to appear in session identifiers, but that’s not the only place; I especially notice them in apps that use Java’s Hibernate interfaces, whether session IDs are involved or not. They propagate themselves all around the other tables, where they become secondary indexes and even get combined with other columns to make even bigger keys.

What’s wrong with this? There are two major things that hurt performance in such cases: larger data and indexes, and non-sequential values. I’ll ignore the latter in this article, since whether an identifier is better off sequential or not is often dictated by specific technology choices (e.g. InnoDB’s clustered primary keys.) But I do want to mention about the larger data and show some ways to alleviate the performance impact of using hexadecimal strings as identifiers.

I’ll speak mostly about MySQL here, but the same techniques apply almost universally to any database.

One: watch out for character set

Take a look at the following EXPLAIN plan:

mysql> explain select * from t where id = '0cc175b9c0f1b6a831c399e269772661'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
         type: const
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 98
          ref: const
         rows: 1
        Extra: Using index

Why is the index 98 bytes long? Simple — the utf8 character set was used:

CREATE TABLE `t` (
  `id` varchar(32) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8

You don’t need to store hexadecimal values with utf8. It doesn’t actually use any more space on disk for the table itself (long complex topic I won’t get into) but it uses about three times more memory and disk space for sorts, group-by, implicit temp tables, and so on. This is MySQL-specific as far as I know.

Two: use fixed-length, non-nullable values

You can see I’m storing the value in a VARCHAR. It would be better to store it in CHAR if all the values are the same length. And here I’ve made it non-NULL, but in many cases I see such columns are permitted to be NULL even when they will always have a value. These are generic best-practice kind of suggestions and apply to all cases, not just this case of hex identifiers. And I should point out that these are low-impact suggestions — following them won’t make a dramatic difference in typical cases (although it does matter a lot sometimes). If you’re going to use utf8, stay away from fixed-length columns, by the way.

Three: Make it BINARY

You don’t actually need to store characters. The characters are just a way of representing numbers. Store them directly. For example, what is 00000000000000000000000000002E2A really? It’s really the base-16 representation of the (base-ten) number 11818 and is much better stored as an integer in 4 bytes (or less) instead of 32.

The problem is, these long values aren’t representable in commonly available integer storage sizes. They’re much bigger than a BIGINT. However, MySQL does permit you to store them in a BINARY column, which is just a string of bytes with no character set semantics. It’s more compactand it’s a lot faster to do comparisons (and hence index lookups). You can use the HEX() and UNHEX() functions to transform between the representations, or just use hexadecimal literal syntax:

mysql> select x'7861707262';
+---------------+
| x'7861707262' |
+---------------+
| xaprb         | 
+---------------+

This makes it easy to work with the values without having to do conversions in your code if that’s more convenient (although note that you’ll be sending more data from/to the server — but if it makes it easier for your code, it might be a reasonable compromise). So instead of

select * from t where id = '0cc175b9c0f1b6a831c399e269772661';

You can just add a single ‘x’ to the query, like this:

select * from t where id = x'0cc175b9c0f1b6a831c399e269772661';

After converting the table to use a BINARY(16) column and re-inserting the same data in binary form, look at the difference it makes:

explain select * from t where id = x'0cc175b9c0f1b6a831c399e269772661'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
         type: const
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 16
          ref: const
         rows: 1
        Extra: Using index

The lookup is on a much smaller value. If you’re using a UUID() that is formatted with strings of hex characters separated by dashes, you can delete the dashes with REPLACE(); they are for visual formatting only and don’t have any semantics.

Four: use prefix indexes

In many but not all cases, you don’t need to index the full length of the value. I usually find that the first 8 to 10 characters are unique. If it’s a secondary index, this is generally good enough. The beauty of this approach is that you can apply it to existing applications without any need to modify the column to BINARY or anything else — it’s an indexing-only change and doesn’t require the application or the queries to change. This is a very low-hanging fruit and it’s usually an option even when other options are not feasible or desired.

Figuring out how much of the value to index can be complex when there is skew (more on this later). In the simplest cases you can just run a query like this:

mysql> select count(distinct id), count(distinct left(id, 8)), count(distinct left(id, 9)) from t\G
*************************** 1. row ***************************
         count(distinct id): 2
count(distinct left(id, 8)): 2
count(distinct left(id, 9)): 2

Here you can see I have a trivial 2-row test case so it’s not a good example. In the real world you can compare the number of distinct values (you can omit that for a primary key, but this technique generally applies best to secondary indexes which may not be unique) to the number of distinct prefixes, for each prefix length, and choose the shortest one that’s “good enough.” Then make an index on that prefix.

Five: build hash indexes

I’ve been working with one application that likes to combine such hex identifiers into “paths” similarly to a filesystem. So if you want to look up some row or other, you ask the database for “92eb5ffee6ae2fec3ad71c777531578f/0cc175b9c0f1b6a831c399e269772661″ or even longer combinations of values.

This does two things. One, it makes keys really, really really long. Two, it makes a lot of duplicate prefixes, so prefix indexing simply doesn’t work. The prefixes have skewed cardinalities so there are values for which a lot of rows will match.

When there’s skew like this, you need to compare the average cardinality to the worst-case, which you can find with queries such as

mysql> select left(id, 10) as left_10, count(*) as cnt from t
group by left_10 order by cnt desc limit 10;
+------------+-----+
| left_10    | cnt |
+------------+-----+
| 0cc175b9c0 |   1 | 
| 92eb5ffee6 |   1 | 
+------------+-----+

Of course here you see again my trivial test case, but you might see very bad results; you might have a half-million rows in the table and one-third of them start with 0cc175b9c0. Even if you make the prefix 32 characters long, you’ll still match one-third of the table. You don’t get a good usable prefix until 32 characters plus 8 characters — and a 40-character index is loooong. So prefix indexing really doesn’t work in cases like this.

What you can do is generate a checksum of the values and index that. That’s right, a hash-of-a-hash. For most cases, CRC32() works pretty well (if not, you can use a 64-bit hash function). Create another column:

mysql> alter table t add crc int unsigned not null, add key(crc);
mysql> update t set crc=crc32(id);
mysql> explain select * from t use index(crc) where id = '0cc175b9c0f1b6a831c399e269772661' and crc=crc32('0cc175b9c0f1b6a831c399e269772661')\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
         type: ref
possible_keys: crc
          key: crc
      key_len: 4
          ref: const
         rows: 1
        Extra: Using where

Now it’s using the 4-byte index on the CRC values. (I had to say “use index(crc)” because on my silly test case there’s so little data it used the PRIMARY key otherwise). You can keep the CRC column updated with a pair of triggers, or maintain it directly in your queries that modify the table. Triggers add side effects you might not desire.

The CRC column isn’t guaranteed to be unique, so you need both criteria in the WHERE clause or this technique won’t work. Hash collisions happen quickly; you will probably get a collision with about 100k values, which is much sooner than you might think — don’t assume that a 32-bit hash means you can put 4 billion rows in your table before you get a collision.

By the way, this isn’t a suggestion that’s unique to MySQL. Other database servers can do the same thing (and even in a more elegant way — many other database servers support indexes over expressions). Here’s an example of how to build a hash index with Microsoft SQL Server. (Anyone know what the underlying algorithm of CHECKSUM() is? It’s not CRC32)

Conclusion

Hexadecimal identifiers make your tables and indexes bigger, and slow down comparisons and lookups. My advice is usually “don’t do it,” but if you must use hexadecimal values for identifiers, hopefully this article has a few suggestions you can use to make them more efficient.

Written by Xaprb

February 12th, 2009 at 10:11 am

Posted in SQL

Tagged with , , , ,

Progress on Maatkit bounty, part 3

with 2 comments

This is the last day I’m taking off work to hack on mk-table-sync, and I thought it was time for (yet another) progress report. Here’s what I have done so far:

  • All the code, except for a tiny bit of “glue” and “setup” code, is in modules.
  • Lots more tests for the modules.
  • A new sync algorithm (I still haven’t rewritten the top-down and bottom-up, which are designed for network efficiency more than MySQL efficiency, and are very complicated). This algorithm is called “Chunk” and is based on the chunking module I’m re-using from two of the other tools. This allows syncing the table a bit at a time to avoid locking it so much.
  • The tool chooses its own parameters, including choosing the sync algorithm automatically by examining indexes.
  • Proper exit codes, as well as several other smaller issues requested via bug reports.
  • The tool now syncs entire servers. That is, you don’t have to specify a table. It’ll find all the tables and just sync them.
  • The tool can sync many servers. You give it five servers, it will treat the first as the source, and sync every table in the source to each of the four remaining servers in turn.
  • It can work via replication. It can discover a master’s slaves via SHOW SLAVE HOSTS and sync each slave to the master. You can also point it at a slave and it’ll discover the master, connect to it, and sync the slave to the master.
  • It integrates with mk-table-checksum’s results. If you’ve given the –replicate option to mk-table-checksum, the slave’s results are stored in a table. It can read that table and sync anything marked as different. This can be combined with sync-to-master and auto-discover-slaves functionality.
  • Lots of other bugs and problems are gone simply because I’m using the modules I wrote for other tools. This includes issues with table parsing, identifier quoting, etc etc. As an aside, I have to roll my own for almost everything, because I can’t rely on things like DBI’s quote_identifier() function — it does not work in earlier versions, which are amazingly common in the real world.

Whew! So what isn’t done yet?

  • Bi-directional syncing.
  • The Nibble sync algorithm. It will be preferred over Chunk and can be used in more cases.
  • Documentation.
  • Full support for wide characters. (This is non-trivial in Perl. I need to research it. A partial solution might not be hard, but I’m worried about the versions included in, for example, RHEL 3, which is very widely used.)
  • Updating other tools to work right with the changes to shared code.
  • Locking and transaction code. The tool will ultimately use FOR UPDATE/LOCK IN SHARE MODE automatically on InnoDB tables instead of locking them, for example.

Here’s a sample of what it can do, using a replication sandbox I set up with Giuseppe’s MySQL Sandbox. The sandbox contains a copy of the Sakila sample database. I’ll just mangle a few films on the slaves:

baron@kanga:~$ cd rsandbox_5_0_45/
baron@kanga:~/rsandbox_5_0_45$ ./s1
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.0.45-log MySQL Community Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

slave1 [localhost] {msandbox} ((none)) > update sakila.film set title='academy dinosaur2' limit 12;
Query OK, 12 rows affected, 12 warnings (0.07 sec)
Rows matched: 12  Changed: 12  Warnings: 0

slave1 [localhost] {msandbox} ((none)) > Bye
baron@kanga:~/rsandbox_5_0_45$ ./s2
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.0.45-log MySQL Community Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

slave2 [localhost] {msandbox} ((none)) > update sakila.film set title='academy dinosaur2' limit 1;
Query OK, 1 row affected, 1 warning (0.05 sec)
Rows matched: 1  Changed: 1  Warnings: 0

slave2 [localhost] {msandbox} ((none)) > Bye

OK, now I’ve messed up the first 12 films on one slave, and the first 1 on another. I could just go ahead and sync them right away, but first I’ll do a table checksum to demonstrate that functionality:

baron@kanga:~/rsandbox_5_0_45$ mk-table-checksum --replicate=test.checksum --port=16045 127.0.0.1 -q

And now I’ll tell the sync tool to go fix the differences the checksum revealed:

baron@kanga:~/rsandbox_5_0_45$ mk-table-sync  --replicate=test.checksum h=127.0.0.1,P=16045 -vx
# Syncing P=16046,h=127.0.0.1
# DELETE INSERT UPDATE ALGORITHM DATABASE.TABLE
#      0      0     12 Chunk     sakila.film
#      0      0      0 Chunk     sakila.film_text
# Syncing P=16047,h=127.0.0.1
# DELETE INSERT UPDATE ALGORITHM DATABASE.TABLE
#      0      0      0 Chunk     sakila.film
#      0      0      0 Chunk     sakila.film_text
baron@kanga:~/rsandbox_5_0_45$ 

Pretty easy, huh? Take a look at the output: the first thing it did was fix the 12 films I changed. sakila.film has a trigger that updates sakila.film_text, so that table got changed too. The checksum tool caught this difference, but the differences were gone by the time the sync tool examined them, again due to the trigger. On the second slave, no differences were found at all, because the changes to the first slave were made on the master, automatically fixing the second slave. (This won’t always be the case, but it worked in this example).

While I’d love to continue building the perfect beast, I’m going to have to call it quits around noon today and start cleaning up, writing the documentation, and getting ready to release the code. I’m not sure how much I’ll finish in that time.

By the way, anyone who wants to is welcome to get the code from the Maatkit SVN repository! I never make a big deal out of that because I generally assume people want to run released code, but SVN is there if you want it…

Written by Xaprb

December 6th, 2007 at 7:29 am

Maatkit version 1297 released

without comments

Download Maatkit

Maatkit (formerly MySQL Toolkit) version 1297 contains a significant update to MySQL Table Checksum (which will be renamed soon to avoid trademark violations). The changelog follows. What you don’t see in the changelog is the unit test suite! I got a lot more of the code into modules that are tested and re-usable.

2007-11-18: version 1.1.19 

* Check for needed privileges on --replicate table before beginning. 
* Made some error messages more informative. 
* Fixed child process exit status with 8-bit right-shift. 
* Improved checksumming code auto-detects best algorithm and function. 
* Added --ignoreengine option; ignores federated and merge by default. 
* Added --columns and --checksum options. 
* Removed --chunkcol, --chunksize-exact, --index options. 
* --chunksize can be specified as a data size now. 
* Improved chunking algorithm handles more cases and uses fewer chunks. 
* Do not print --replcheck results for servers that are not slaves. 
* Create only one DB connection for each host, not one per host/tbl/chunk. 
* Code assumed backtick quoting, broke on SQL_MODE=ANSI (bug #1813030). 
* There were many potential bugs with database and table name quoting. 
* Child exit status errors could be masked by subsequent successes.

Written by Xaprb

November 19th, 2007 at 12:08 am

Posted in Uncategorized

Tagged with , ,