Floating or drowning

I never went to the university. After high school, I didn’t expect to last much longer in school. Thankfully, or so I thought back then, there is a lot of different technical degree/training available on the market.  So, when I had to choose between 6 more school years through university or 3 years (in Québec’ s wonderful CÉGEP), the choice was pretty easy.

Now, I can’t say the formation I had was bad. But looking back on it, I feel like a lot of critical information was missing. I did quite a few blunders because of information I didn’t get. I learned from my mistakes… But learning before making mistakes is even better.

One of the thing I wish they had explained in my classes is how floating points data types work.  I had so many WTF moments working with floating points variable before I finally understood what was going wrong, it’s not even funny. If you don’t know anything funny going on with floating points, get ready to have your mind blown.

So… Lets take this simple routine:

const
  MY_VALUE = 0.7;

procedure TForm1.Button1Click(Sender: TObject);
var dVal : Double;
begin
  dVal := MY_VALUE;

  if dVal <> MY_VALUE then
    ShowMessage('OMG! My CPU fails basic arithmetics!');
end;

So now, I’m asking you: Do you think the message will pop-up? The short answer is YES! (The long answer is : It depends).

The first thing that needs to be understood is that most floating points value cannot be expressed precisely in binary. Some data type like BCD(binary coded decimal) works around that problem, but BCD’s performance isn’t as good as other datatypes and thus, not usually used for most purpose. So, once encoded, what is 0.7 encoded as? I’ll use the following figure for illustration purpose :

64bits : 0.70000000003
80bits : 0.69999999999

The 2nd part of the problem come from the fact that floating literals in Delphi are of type Extended(80 bits). So, here’s what is happening with our code.  dVal is a double (only have 64 bits precision). When we compare it with MY_VALUE,  Delphi consider it like a floating point literal (thus 80 bits).  Since it can’t compare apples and oranges , it makes some juice, or in this case, upscale dVal to 80 bits precision.

Now, why wouldn’t dVal become 0.69999999999 once upscaled to 80 bits? Because it doesn’t contains 0.7, but really 0.70000000003. To work around those problems, the Math unit contains several CompareValue functions that are designed to test floating point values.

Now, for the long answer. Some of you might have tested the code above and didn’t get the popup message, while some others did. Why is that? New compiler, new rules.  One rule that did change is that under the 64 bits compiler of Delphi, the Extended type is now 64 bits.  So in WIN64, all the floating points stays in 64 bits format and doesn’t suffer from rounding/conversion errors.

From what I read, Extended became an alias to Double because the compiler is using SSE2 instructions for floating point operations instead of the x87 instructions it uses in Win32.

As for the other platforms (iOS, Android), I’m usable to test at this time.

Still, while the specifics may change, the problems linked to floating points conversion remains the same no matter which platform/language you use. If you want to dig further on the subject, you can read What Every Computer Scientist Should Know About Floating-Point Arithmetic.

5 thoughts on “Floating or drowning”

  1. Would probably have worked with 0.75 or 0.5 though.

    The problem was that 80bits integer were note a “standard”. When dealing with floating points, one usually refers to single or double precision. 32 or 64 bits. That’s how they’re used in the CPU (FPU). There are no 80 bits floating points operations nor registers in the 80×86/80×87. The Delphi Extended type was probably dealt with in software from day 1 because of this and that was ok at the time since before the Pentium, one was not sure to have a FPU in its CPU…

    Anyway. As a rule of thumb, never compare two floating points directly in your code if you’re not sure they come from the same source/operation. If you have to, use a function that will accept a tolerance.

    1. Wikipedia’s article on x87 mentions :

      The x87 provides single precision, double precision and 80-bit double-extended precision binary floating-point arithmetic as per the IEEE 754-1985 standard. By default, the x87 processors all use 80-bit double-extended precision internally[…]

      So 80 bits float aren’t a Delphi-only paradigm. As for pre-pentium era… Was that when dinosaurs still roamed the earth? 😉

      1. The 80bit format was only an _internal_ format, used internally for intermediate, temporary values to improve precision.

        There are no FADD, FSUB, FMUL, FDIV or FCOMP instructions using 80bit floats, only 32 and 64 bits.

Leave a Reply

Your email address will not be published. Required fields are marked *