November | 2014 | All bits are off!

I never went to the university. After high school, I didn’t expect to last much longer in school. Thankfully, or so I thought back then, there is a lot of different technical degree/training available on the market. So, when I had to choose between 6 more school years through university or 3 years (in Québec’ s wonderful CÉGEP), the choice was pretty easy.

Now, I can’t say the formation I had was bad. But looking back on it, I feel like a lot of critical information was missing. I did quite a few blunders because of information I didn’t get. I learned from my mistakes… But learning before making mistakes is even better.

One of the thing I wish they had explained in my classes is how floating points data types work. I had so many WTF moments working with floating points variable before I finally understood what was going wrong, it’s not even funny. If you don’t know anything funny going on with floating points, get ready to have your mind blown.

So… Lets take this simple routine:

const
  MY_VALUE = 0.7;

procedure TForm1.Button1Click(Sender: TObject);
var dVal : Double;
begin
  dVal := MY_VALUE;

  if dVal <> MY_VALUE then
    ShowMessage('OMG! My CPU fails basic arithmetics!');
end;

So now, I’m asking you: Do you think the message will pop-up? The short answer is YES! (The long answer is : It depends).

The first thing that needs to be understood is that most floating points value cannot be expressed precisely in binary. Some data type like BCD(binary coded decimal) works around that problem, but BCD’s performance isn’t as good as other datatypes and thus, not usually used for most purpose. So, once encoded, what is 0.7 encoded as? I’ll use the following figure for illustration purpose :

64bits : 0.70000000003
80bits : 0.69999999999

The 2nd part of the problem come from the fact that floating literals in Delphi are of type Extended(80 bits). So, here’s what is happening with our code. dVal is a double (only have 64 bits precision). When we compare it with MY_VALUE, Delphi consider it like a floating point literal (thus 80 bits). Since it can’t compare apples and oranges , it makes some juice, or in this case, upscale dVal to 80 bits precision.

Now, why wouldn’t dVal become 0.69999999999 once upscaled to 80 bits? Because it doesn’t contains 0.7, but really 0.70000000003. To work around those problems, the Math unit contains several CompareValue functions that are designed to test floating point values.

Now, for the long answer. Some of you might have tested the code above and didn’t get the popup message, while some others did. Why is that? New compiler, new rules. One rule that did change is that under the 64 bits compiler of Delphi, the Extended type is now 64 bits. So in WIN64, all the floating points stays in 64 bits format and doesn’t suffer from rounding/conversion errors.

From what I read, Extended became an alias to Double because the compiler is using SSE2 instructions for floating point operations instead of the x87 instructions it uses in Win32.

As for the other platforms (iOS, Android), I’m usable to test at this time.

Still, while the specifics may change, the problems linked to floating points conversion remains the same no matter which platform/language you use. If you want to dig further on the subject, you can read What Every Computer Scientist Should Know About Floating-Point Arithmetic.

All bits are off!

Monthly Archives: November 2014

Floating or drowning

About computer science, programming and more!