How to avoid imprecise DECIMAL math in MySQL
Posted in Databases on Mar 8, 2006
MySQL versions 4.1 and below use imprecise math in operations with
DECIMAL data, which is supposed to be precise (that’s the whole point). There is no real solution to the problem, though there are workarounds. There is also at least one genuine bug in MySQL related to this problem. In this article I’ll explain the problems, demonstrate them in action, and show you how to work around them.
Many fractional values cannot be represented exactly as a floating-point number in computers. For example, the value one-tenth, which we represent as 0.1 in base ten, is impossible to represent exactly in base two. SQL databases provide fixed-point data types to support precision math where it is needed. Currency values are a typical use.
DECIMAL data types, which store numbers as strings instead of as numbers so there are no issues representing them exactly, but versions prior to 5.0 perform operations on the values with floating-point math. For example, the
SUM function converts the string representation to floats before operating on them. This is documented in the MySQL manual.
I first encountered this problem while building a system to import my financial data into a database so I could query the transactions with SQL. I ran a query to find unbalanced transactions caused by splits that had been deleted:
select transaction, sum(amount) as amount from split group by transaction having sum(amount) <> 0;
Here’s the result:
sum(amount) <> 0 should have eliminated about half those tuples. The fact that it didn’t, combined with the presence of -0.00 (negative zero), made me suspect floating-point values were being used behind the scenes. The numbers were being displayed as fixed-point, but if I could display them as floating-point, I could verify my theory. It’s not possible to use
CAST to cast a value to floating-point in MySQL 4.1, but I accomplished the same thing by multiplying the
amount column by 1e1. When I did this, I found the numbers weren’t exactly zero; they were just close, for example, 3.1086244689504e-13.
After hunting around for a while without luck, I entered a bug report, which was changed to not-a-bug status with a friendly pointer to the documentation (oops!).
There’s still at least one real bug, though. The following query adds an
ORDER BY clause to the query above:
select... order by amount;
The results are interesting indeed!
What happened to the spurious results? They disappeared! An
ORDER BY clause is never supposed to do anything but order the results; it certainly should not eliminate tuples. I think this is kind of weird, creepy and cool at the same time.
If I had to guess, I’d say the results are being ordered before they are filtered by the
HAVING clause, and in the process, converted back to
DECIMAL from float, then eliminated by the
HAVING. That may not be what’s really happening, but it seems likely.
This strange behavior raises the possibility of improving the code, too. Ordering may not be as efficient as it could be; it should be the last operation in a
SELECT so it operates on as few tuples as possible. And an implicit cast forced by the ordering operation doesn’t seem right, either; it means the ordering clause changes the values in the tuples as well as changing which tuples are present in the output. Both should be of concern.
There is no way to force precision math in these types of operations. It cannot be done. No amount of casting or rounding will fix the problem reliably on every platform (math is different on every platform, and a “solution” may not work everywhere). There are a couple of workarounds, though.
having sum(amount) <> 0, use
having abs(sum(amount)) < .001.
ORDER BYclause! Just kidding.
With a little imagination, it’s probably possible to work around most situations. I hope this article helps you avoid possible problems with imprecise math.