Category Archives: Data structure

Evolving from the primitive world

One of the novelty of the recent versions of Delphi is the built-in record helper for the primitive types (string, Integer, etc.). Record helper are essentially class helper for record (and apparently primitive types). The new class helper allows code syntax like

[Code language=”delphi”]
ShowMessage(1.ToString);
[/code]

to compile and work the same as the good old

[Code language=”delphi”]
ShowMessage(IntToStr(1));
[/code]

One of the drawback of record helper is, like class helper, there can only be 1 “active” at any given point in the source code. Any new helper we declare will hide or be hidden by (depending on scope resolution) Delphi’s helper. Also, record helpers are pretty good for generic operations on primitives, but what about more specific operations? It is not ideal to have something like this compile:

[Code language=”delphi”]
var
sFirstName : string;
begin
[…]
sFirstName.GetAreaCode;
[/code]

This problem is especially true for string, as we quite often use strings formatted in specific ways to give them special meanings.

My favorite approach when I have to work with such string is to create what I like to call “Smart Primitive”.

What I like about this concept is that, for one, the validity of the format is done at the moment of initialization. That means that any routine that use the record can assume the value is in an accepted format. [1]

So, what is a smart primitive?

  1. Smart primitives are opaque records. Only their methods should use their internal fields. Very often, they will only have a single internal field.
  2. They implement at least 1 implicit conversion to and from a primitive type. That is how the internal field of the record should be initialized.

Now, lets get a little more concrete.  Earlier, I gave the GetAreaCode example, so lets see how a PhoneNumber basic smart primitive would look like.

In North America, phone numbers always have 10 digits, the area code being the first 3 [2]. Now, lets say we want to allow users to enter any of those formats :

  1. (999)999-9999
  2. 999-999-9999
  3. (999)-999-9999
  4. 9999999999

Here’s a basic implementation :

[Code language=”delphi”]
TabPhoneNum = record
private
FPhoneNum : INT64;
FOriginalValue : string;
public
class operator Implicit(S : String) : TabPhoneNum;
class operator Implicit(APhoneNum : TabPhoneNum) : String;
[…]
function GetAreaCode : Integer;
end;

class operator TabPhoneNum.Implicit(S: String): TabPhoneNum;
var sNormalized : String;
begin
sNormalized := NormalizePhoneNum(S); //Remove “(“,”)”,”-” and ” ”
FPhoneNum := StrToInt64(sNormalized ); //If the normalized string is not a valid integer, we want to raise
if GetDigitsCount(FPhoneNum) <> 10 then
Raise Exception.Create(S + ‘ is not a valid phone number’);
FOriginalValue := S;
end;

function TabPhoneNum.GetAreaCode : Integer;
begin
//We don’t need to validate the format of FOriginalValue, it’s been done already
//We don’t need to validate the number of digits, it’s been done already
Result := (FPhoneNum div 10000000); // we don’t need to “mod 100” because we validated it is 10 digits
end;

function TForm1.GetAreaCode(const APhoneNum : TabPhoneNum) : Integer;
begin
Result := APhoneNum.GetAreaCode;
end;

procedure TForm1.Button1Click(Sender : TObject);
begin
ShowMessage(GetAreaCode(Edit1.Text).ToString);
//here, if Edit1.Text is not in a valid format, it will raise an exception before calling Tform1.GetAreaCode.
end;
[/code]

In the implementation of GetAreaCode, we see one of the main advantage of smart primitives. We don’t have to deal with all the different string format and we don’t have to validate the number of digits in FPhoneNum because it was already done when the record was first initialized.

Of course, we could simply implement GetAreaCode like this :

[Delphi]
function GetAreaCode(const S : string) : Integer;
begin
Result := Copy(S, 2, 3).ToInteger;
end;
[/Delphi]

and go based on the design by contract philosophy, and state that the input string needs to start with “(” followed immediately by the area code, etc. But that puts the burden of the contract on the caller. For a single method, it might make sense, but if you have a collection of methods having the exact same preconditions in the contract, it would make more sense laying out the contract’s preconditions as being the smart primitive’s contract instead, and then let its methods work with an already enforced contract.

This also has the advantage of “scoping” the methods where they belong. One too many time I have opened a unit filled with routine taking a single string as a parameter, none of them accepting the same “kind” of string.

So, how do you like this? Already doing it? Like it? Hate it?  Let me know what you think!

1.I say “Accepted” and not “Valid” on purpose. A smart primitive can allow invalid values.
2.Except in Mexico where it varies. For the sake of simplicity, I’ve excluded this fact.

Floating or drowning

I never went to the university. After high school, I didn’t expect to last much longer in school. Thankfully, or so I thought back then, there is a lot of different technical degree/training available on the market.  So, when I had to choose between 6 more school years through university or 3 years (in Québec’ s wonderful CÉGEP), the choice was pretty easy.

Now, I can’t say the formation I had was bad. But looking back on it, I feel like a lot of critical information was missing. I did quite a few blunders because of information I didn’t get. I learned from my mistakes… But learning before making mistakes is even better.

One of the thing I wish they had explained in my classes is how floating points data types work.  I had so many WTF moments working with floating points variable before I finally understood what was going wrong, it’s not even funny. If you don’t know anything funny going on with floating points, get ready to have your mind blown.

So… Lets take this simple routine:

const
  MY_VALUE = 0.7;

procedure TForm1.Button1Click(Sender: TObject);
var dVal : Double;
begin
  dVal := MY_VALUE;

  if dVal <> MY_VALUE then
    ShowMessage('OMG! My CPU fails basic arithmetics!');
end;

So now, I’m asking you: Do you think the message will pop-up? The short answer is YES! (The long answer is : It depends).

The first thing that needs to be understood is that most floating points value cannot be expressed precisely in binary. Some data type like BCD(binary coded decimal) works around that problem, but BCD’s performance isn’t as good as other datatypes and thus, not usually used for most purpose. So, once encoded, what is 0.7 encoded as? I’ll use the following figure for illustration purpose :

64bits : 0.70000000003
80bits : 0.69999999999

The 2nd part of the problem come from the fact that floating literals in Delphi are of type Extended(80 bits). So, here’s what is happening with our code.  dVal is a double (only have 64 bits precision). When we compare it with MY_VALUE,  Delphi consider it like a floating point literal (thus 80 bits).  Since it can’t compare apples and oranges , it makes some juice, or in this case, upscale dVal to 80 bits precision.

Now, why wouldn’t dVal become 0.69999999999 once upscaled to 80 bits? Because it doesn’t contains 0.7, but really 0.70000000003. To work around those problems, the Math unit contains several CompareValue functions that are designed to test floating point values.

Now, for the long answer. Some of you might have tested the code above and didn’t get the popup message, while some others did. Why is that? New compiler, new rules.  One rule that did change is that under the 64 bits compiler of Delphi, the Extended type is now 64 bits.  So in WIN64, all the floating points stays in 64 bits format and doesn’t suffer from rounding/conversion errors.

From what I read, Extended became an alias to Double because the compiler is using SSE2 instructions for floating point operations instead of the x87 instructions it uses in Win32.

As for the other platforms (iOS, Android), I’m usable to test at this time.

Still, while the specifics may change, the problems linked to floating points conversion remains the same no matter which platform/language you use. If you want to dig further on the subject, you can read What Every Computer Scientist Should Know About Floating-Point Arithmetic.