Xaprb

Stay curious!

Why you should not use BIT columns in MySQL

with 10 comments

MySQL implements the BIT data type differently in different versions, and the behavior is not what one might expect. In this article I’ll explain how MySQL’s behavior has changed over time, what strange things can happen as a result, and how to understand and work around display issues. I’ll tell you about a serious bug I’ve found, and discuss differences in the BIT data type between MySQL and Microsoft SQL Server.

History

MySQL has supported the BIT data type for a long time, but only as a synonym for TINYINT(1) until version 5.0.3. Once the column was created, MySQL no longer knew it had been created with BIT columns.

In version 5.0.3 a native BIT data type was introduced for MyISAM, and shortly thereafter for other storage engines as well. This type behaves very differently from TINYINT.

Changed behavior

The old data type behaved just like the small integer value it really was, with a range from -128 to 127. The new data type is a totally different animal. It’s not a single bit. It’s a fixed-width “bit-field value,” which can be from 1 to 64 bits wide. This means it doesn’t store a single BIT value; it’s something akin to the ENUM and SET types. The data seems to be stored as a BINARY value, even though the documentation lists it as a “numeric type,” in the same category as the other numeric types. The data isn’t treated the same as a numeric value in queries, however. Comparisons to numeric values don’t always work as expected.

This change in behavior means it’s not safe to use the BIT type in earlier versions and assume upgrades will go smoothly.

Display issues

The client libraries, including the command-line client and all the GUI clients I’ve seen, don’t seem to know how to handle BIT values. They don’t display them as a series of ones and zeroes. For instance, the following code even breaks the alignment of the command-line output!

mysql> create table test (i bit not null default 0);
mysql> insert into test (i) values (1), (1), (0), (0);
mysql> select * from test;
+---+
| i |
+---+
|  |
|  |
|   |
|   |
+---+

As I mentioned above, the data seems to be stored internally, and transmitted through the client libraries, as a BINARY value, which is actually a string type in MySQL. How it displays depends on the width of the column. For example, if the column is 32 bits wide, it is treated as CHAR(4). If it’s 8 bits wide, it is treated as CHAR(1):

create table test(ch bit(8));
mysql> insert into test values (b'01011010');
mysql> select * from test;
+------+
| ch   |
+------+
| Z    |
+------+

To display the value as an integer, it has to be cast to another type. One way to do this is add 0 to the value: select ch + 0 from test;. Another way is select cast(ch as unsigned) from test;

Display width seems to be related to value with BIT, in contrast to what the manual’s section on Overview of Numeric Types states: “Display width is unrelated to the storage size or range of values a type can contain.” It appears that a field of size M can only store M bits, so it’s the storage size, not the display size, that’s affected. As I mentioned, bit values don’t display as ones and zeroes anyway, so it makes no sense to say an 8-bit wide column “displays with a width of 8.” It doesn’t display 8 bits, it stores 8 bits. In fact, inserting b'100000000' into the table I defined above will store the value 255, demonstrating that the actual value has a maximum capacity of 8 bits. Any bits not set explicitly to 1 are set to 0, so values are left-padded with zeroes (the most significant bits are zeroes).

Bugs

I’ve discovered some very strange bugs with BIT columns in MySQL. The issue I noticed was a LEFT OUTER JOIN failing when it should have succeeded. I discovered a combination of factors could cause the bug to appear and vanish. For example, the join will succeed or fail depending on combinations of these factors:

  1. the presence of an additional column, not involved in the query at all
  2. the presence of additional rows which don’t match in the join
  3. the order of columns in the table
  4. the presence of an additional tautology in the join clause

I’ve filed a bug with MySQL about the issues I found, including a script which demonstrates several ways to trigger the bug.

Why use it?

Given the problems I’ve mentioned, I recommend avoiding it entirely. It provides nothing that’s not already possible with standard numeric types and adds a lot of confusion.

The only reason I personally would consider using this data type in MySQL is to document the actual usage of the values in it, as described in an earlier article on choosing column types. Unfortunately, I think it’s counterintuitive enough that I wouldn’t even choose to use it there, because I think a BIT column intuitively sounds like it ought to store a single bit.

Differences from Microsoft SQL Server

Microsoft SQL Server also provides a BIT data type. However, it’s completely different; it’s a single-bit column. Internally, it is stored as a single bit within an integer data type. As successive BIT columns are added to a table, SQL Server packs them together behind the scenes. This is equivalent to doing bitmask operations on a single column (my previous employer loved bitmask columns!), but it allows the bits to be named explicitly, avoiding the need to pass around named constants (or embed magic numbers) and deal with bitwise arithmetic.

Pros and cons of bitmask columns

Bitmask columns (an integer within which each bit is retrieved and set via bitwise arithmetic) can be extremely handy. They’re a very compact way to pack true/false values together for efficient storage. They can also facilitate certain types of queries; for example, “if any value is set” queries become simple. I’ve used them in ACLs stored in a database, for instance. Certain types of problems are just easy to solve with bitwise arithmetic, and for those problems, creating an integer column and declaring “bit 5 is whether the value is [something]” makes a lot of sense.

On the negative side, bitmask columns can be hard to use. For one thing, they’re hard to understand. Without the documentation that says which bit means what, they’re pretty much useless. SQL Server avoids this and the other problems I’ll name by treating each bit as a separate column and naming it, but that’s only if you use that facility, which my previous employer didn’t. Bitwise arithmetic can also be pretty tricky to write, and even harder to read.

Magic numbers in queries are just as meaningless as a column named bitcolumn1. Declaring and passing around constants to name the magic numbers is a nice thought, but it’s error-prone and it’s extra work. Creating a table to define the bits can be quasi-helpful as well, unless (as often happened at my previous employer) you can’t find the table, or the column is named such that you can’t tell which table defines the values for which column, or the table’s values don’t make any sense for bitwise arithmetic.

Bitmask columns are also not index-friendly, so querying against them isn’t optimal. Of course, any column with only two values is useless to an index anyway, so this is no worse, performance-wise, than storing the yes/no values in columns by themselves. Since there’s less data to examine, it can actually be more efficient.

Finally, there can be some subtleties about bitwise arithmetic, such as issues relating to signed and unsigned numbers.

The bottom line is, I think bitmask columns should be used sparingly, and only when the nature of the data and computations really makes them the obvious choice.

Written by Xaprb

April 11th, 2006 at 5:03 am

Posted in SQL

10 Responses to 'Why you should not use BIT columns in MySQL'

Subscribe to comments with RSS or TrackBack to 'Why you should not use BIT columns in MySQL'.

  1. Good article, thanks. Saved me time.

    For ColdFusion Users: an updated MySQL Connector/J allows ColdFusion to connect to MySQL 5, but does not seem to play well with these new MySQL BIT fields (at least as of now). Using TINYINT instead of BIT seems to be the best work-around.

    Marek

    12 May 06 at 2:36 pm

  2. [...] I found this article that explains how it works, and that you should avoid it. Why you should not use BIT columns in MySQL – Xaprb Why you should not use BIT columns in MySQL [...]

  3. Your previous employer is probably the one who should read this…

    I hope he/she loves bitmasks because they truly provide a necessary performance or storage advantage.

    …As opposed to bad reasons for loving them, e.g.:

    • They are nifty (which they are, no doubt!)
    • They are a vestige of an old architecture from the days when storage and CPU cycles were not so cheap.
    • They are a vestige of an old mentality from the days when storage and CPU cycles were not so cheap.

    In a general sense, here in “modern times”, I think individually named columns/variables are a superior solution. I think in the vast majority of cases, the potential performance and storage gains of bitmasks are at least negated, if not outweighed, by the readability — and thus maintainability — difficulties they pose. (I mean … if simple magic numbers are bad, then these, which are essentially compound magic numbers, must be compound bad!)

    Just one man’s considered opinion…

    Now off to convert all my BIT columns to TINYINTs!

  4. MySQL Administrator 1.1.9 for Windows exports my bit(1) fields as strings when executing a backup, for example ‘\0′ (our server version is Server version 5.0.24a-max and the table in question has ENGINE=ndbcluster).

    Ole V. Villumsen

    12 Mar 07 at 9:46 am

  5. [...] far so good… …until you read the blog post called “Why you should not use BIT columns in MySQL” by Xaprb. Another serious deficiency is the fact that a database dump will not export bit data as [...]

  6. [...] Yet another difference between MySQL and SQL Server concerning small details, as described in here. [...]

  7. Here’s another reason not to use this type in your tables:

    Using the MySQL .NET connector (5.1.4), and running a SELECT query that returns a column of BIT type, I could find no valid cast to retrieve the value of the column into a variable in my C# code. I tried bool, Byte, and Int16, and it always threw the exception “The specified cast is not valid.” I didn’t try the larger integers, assuming they wouldn’t work either.

    In fact, I ran across this blog while searching for a solution to this problem.

    For instance:

    MySqlCommand cmd = MySqlConn.CreateCommand();
    cmd.CommandType = CommandType.Text;
    cmd.CommandText = “SELECT bitField FROM myTable”;

    MySqlDataReader dr = cmd.ExecuteReader();
    Object ob;
    while (dr.Read())
    {
    if ((ob = dr[0]) != DBNull.Value)
    bool BitField = (bool)ob;
    }

    gave me this exception, as did making BitField a Byte and Int16 type and trying those casts. I also tried simply doing BitField = (bool)(dr["bitField"]); Same result.

    Once I changed the column type to TINYINT and my code variable type to Byte, it worked fine. So thanks!

    (I would submit this as a bug to MySQL, but they would just say “That’s the way it was designed, therefore it’s not a bug.” Uh huh…)

    (Consulting Webster to find out whether “cat” is synonymous with “feline”… or is this a trick question?)

    Craig

    7 Jan 08 at 1:08 am

  8. How about using bit fields to solve search “within” search queries on the magnatude of web search engine data stores?

    rebump

    29 Jan 08 at 5:41 pm

  9. Found your blog entry when I started getting inexplicable problems with BIT…

    I found a problem using NET connector 5.0.8.1 and version MySQL version 5.1.22 (so it persists). My problem was that data in a column defined as BIT was appearing as ‘1′ when I selected the table with a simple select statement, but appeared as ‘0′ when I added a LEFT JOIN to another table !!! I have moved this column to TINYINT. And now I need to fix a load of other columns previously defined as BIT.

    Tyrone

    30 Apr 08 at 7:47 am

  10. Thanks for the info, Xaprb, I use to have this problem but I never realized why. It’s very frustating to use MySql when you always use MSSQL.

    Regards.

    Suru

    4 Dec 09 at 12:08 am

Leave a Reply