Stay Curious!

Charset support in MySQL is really not all that complex

The headline is flame-bait, don’t take it. I just wanted to point something out about character sets and collations in MySQL.

To the uninitiated, it may seem overwhelming. Everything has a character set! Everything has a collation! And they act weirdly! The server has one. The database has one (oh, and it changes magically as I USE different databases.) Every table has one, and columns too. Is that all? NO! My connection has one! Kill me now!

Relax. In truth, only one kind of thing actually has a charset/collation. That is values. And values are stored in columns. The only thing that really has a charset/collation is a column.[1]

What about all the rest of those things – connection, database, server, table? Those are just defaults, which determine what charset/collation a value gets if it isn’t overridden. So if the table’s default charset is utf8, and you add a column without saying what it should be – why, it’ll be utf8. If the database’s default is latin1 and you add a table without saying what its default should be, it’ll be latin1.

[1] It’s not quite true. Literal values in SQL statements are values too, as are @user_variables. But if your connection’s charset is latin1 and you say SELECT "xaprb", without an explicit introducer, you’re really saying SELECT _latin1 "xaprb". Again, the only thing that really has a charset/collation is a value. Other things are just defaults.

Posted on Mon, Feb 22, 2010. Approximately 300 Words.

Databases