Our first idea is rarely the best

This holds true for many aspect of life. But in programming, it’s nearly a constant. Even for very simple task,  it is hard to come with the very best solution right away.

Lets take, for example, determining if a number is prime or not. The definition of a prime number is (according to wikipedia):

A prime number (or a prime) is a natural number greater than 1 that has no positive divisors other than 1 and itself.

Based on the definition, the first implementation that comes to mind for most people (including myself) is the following.

function IsPrime(AValue : Integer) : boolean;
var I : Integer;
begin
  Result := True;
  for I := 2 to AValue - 1 do
  begin
    if AValue mod I = 0 then
      EXIT(False);
  end;
end;

This is a valid implementation. It has quite a few flaws, for example, all numbers lower than 2 will be listed as primes. Lets assume the contract of the function states AValue needs to be a positive integer larger than 1, we can now say the function gives the right answer all the time. But, what is even better than getting the right answer? Getting it FAST!  And how do we do this? Well, one of the lesson I learned from Michael Abrash’s book Zen of Code Optimisation, –“Know your data!”.

Here, our data is numbers and number properties. The property we need to observe here is the symmetrical nature of number’s divisors. Lets start with a concrete number: 30.  Its divisors are :

30 – 1, 2, 3, 5, 6, 10, 15, 30

What we can observe here is : if wemultiply the Nth term from the beginning of the list and multiply it by the Nth term from the end of the list, we always get 30 . What lies straight in the middle of the list? The square root of the number we are observing.

30 – 1, 2, 3, 5,(√30), 6, 10, 15, 30

The conclusion here is : For any divisors of X greater than X’s square root, there is also a lower divisor. In other word, if we don’t really need to divide 30 by 15, since dividing 30 by 2 does the same test. With that new knowledge in mind, we can now improve our IsPrime function.

function IsPrime(AValue : Integer) : boolean;
var I : Integer;
begin
  Result := True;
  for I := 2 to Trunc(Sqrt(AValue)) do
  begin
    if AValue mod I = 0 then
      EXIT(False);
  end;
end;

So, we went from AValue – 2 divisions all the way down to √AValue – 1 divisions. There certainly isn’t any other optimization left, right? As a pure standalone function, that’s pretty much as far as we can get. But it is still possible to go further than this, depending how much memory we want to commit to the task.

With our latest revision of IsPrime, we start by dividing by 2, then by 3, then by 4… Wait!  If 2 doesn’t divide our number, 4 certainly won’t… Why do we test for 4? The main reason is, we don’t “know” 4 isn’t a prime number. If we knew it, we would know we already tested 4 indirectly through one of it’s factor(in this case, 2). Computing this information every time we call the function would be pretty process intensive.  But… keeping a list of all the primes would take lot of memory and would also take a while to compute, no?

Actually, no! Remember, we only need to have a list of all the primes up to the square root of the number we want to test. That means that, to test the largest possible 32 bits unsigned integer(4294967295), we only need the list of all primes number from 2 to 65536. Spoiler alert!  There is only 6542 primes in that range. Those can be computed in 10s of msecs on modern day computer and only take 13 kB of memory if stored as words. (We can go as low as 8 kB if we store them as an array of bits, but it would be a lot less efficient to work with). Now, what does our code looks like?

(Full code example with comments will be available for download soon)

type
TPrimes = class abstract
strict private class var
 FPrimeList : TList<Integer>;
 FHighestValueTested : Integer;
 class procedure LoadPrimesUpTo(AValue : Integer);
strict private
 class constructor Create;
 class destructor Destroy;
public
 class function IsPrime(AValue : Integer) : Boolean;
end ;

class function TPrimes.IsPrime(AValue: Integer): Boolean;
var I, iValueRoot, idx : Integer;
begin
 if AValue <= FHighestValueTested then
   Result := FPrimeList.BinarySearch(AValue, idx)
 else
 begin
   Result := True;
   iValueRoot := Trunc(Sqrt(AValue));
   LoadPrimesUpTo(iValueRoot);
   if not FPrimeList.BinarySearch(iValueRoot, idx) then
     Dec(idx);
   for I := 0 to Min(FPrimeList.Count - 1, idx) do
   begin
     if AValue mod FPrimeList.List[I] = 0 then
       EXIT(False);
   end;
 end;
end;

class procedure TPrimes.LoadPrimesUpTo(AValue: Integer);
var
 I: Integer;
begin
 for I := FHighestValueTested + 1 to AValue do
 begin
   if IsPrime(I) then
     FPrimeList.Add(I);
   FHighestValueTested := I;
 end;
end;

In this example, I dynamically grow the list as needed. So if we don’t need to test very large numbers, we use less memory. So, to test if 4294967295 is prime, we went from up to 4294967293 divisions down to a maximum of 6542 division. I think we did good job on this!
(On a second thought, it can be divided by 5, so not that much of a gain for that specific number! 😉 )

 

One thought on “Our first idea is rarely the best”

  1. The content of TPrimes.Create() is not declared in the code sample. Without this information, we do not know how the members are initialized.

    I initially thought that FHighestValueTested was 0, but this was causing an infinite loop. 1 then? Neither.

    Analyzing the code, I realized that the list HAD to contain at least one prime otherwise the idx variable would be negative in some instance.

    So I assumed this:
    class constructor TPrimes.Create;
    begin
    FPrimeList := TList.Create();
    FPrimeList.Add(2);
    FHighestValueTested := 2;
    end;

    class destructor TPrimes.Destroy;
    begin
    FPrimeList.Free();
    end;

    and it works … but it still could be optimized.

    As the code is, the function LoadPrimesUpTo() is doing an superfluous check each time. Indeed, this check:

    for I := FHighestValueTested to AValue do

    means that an already tested value is tested first each call. Useless. It also has the effect that on some occasion, the same prime will be added twice in the list.

    Simple optimisation/fix:

    for I := FHighestValueTested+1 to AValue do

    and you’re good.

Leave a Reply to Ralos Cancel reply

Your email address will not be published. Required fields are marked *