Category Archives: Optimization

Shlemiel the painter strikes again

Once upon a time, I was introduced to Shlemiel.  Who is Shlemiel? For those too lazy to look up the link, he’s the guy from this joke :

Shlemiel gets a job as a street painter, painting the dotted lines down the middle of the road. On the first day he takes a can of paint out to the road and finishes 300 yards of the road. “That’s pretty good!” says his boss, “you’re a fast worker!” and pays him a kopeck.
The next day Shlemiel only gets 150 yards done. “Well, that’s not nearly as good as yesterday, but you’re still a fast worker. 150 yards is respectable,” and pays him a kopeck.
The next day Shlemiel paints 30 yards of the road. “Only 30!” shouts his boss. “That’s unacceptable! On the first day you did ten times that much work! What’s going on?”
“I can’t help it,” says Shlemiel. “Every day I get farther and farther away from the paint can!”

We can all agree Shlemiel wasn’t working very efficiently! Spotting inefficiency in tasks in the physical world is pretty easy because it’s very concrete. Doing the same in the programming world is a lot harder since we don’t usually observe the execution of our code, we usually only observe the end result. It’s very easy to copy Shlemiel’s without even realizing it.

Us, Delphi programmers, we can find example of Shelmiel’s work pretty close to home. There’s a few of those in Delphi’s VCL/RTL. The one I’m going to introduce today is StringReplace.

StringReplace is notoriously slow for large string, and after profiling the function to figure out why it is so slow, I’m pretty amazed Embarcadero didn’t bother to fix its implementation as it is very simple to fix.

The source of the problem lies here :


Offset := AnsiPos(Patt, SearchStr);
if Offset = 0 then
begin
Result := Result + NewStr;
Break;
end;
Result := Result + Copy(NewStr, 1, Offset - 1) + NewPattern;
NewStr := Copy(NewStr, Offset + Length(OldPattern), MaxInt); //This line is a problem
if not (rfReplaceAll in Flags) then
begin
Result := Result + NewStr;
Break;
end;
SearchStr := Copy(SearchStr, Offset + Length(Patt), MaxInt); //This line is a problem

The 2 flagged line take together more than 90% of execution time when called with a large string as input. Now, what happens here is this :

Shlemiel realized how inefficient his methods were. So the next day, shlemiel decided to transport to the whole road to his paint can. That way, he would need to walk a lot less to paint the road!

Everytime the pattern is matched, they make a whole copy of NewStr and a whole copy of SearchStr. This makes the algorithm run in O(n2) time. Fixing this is relatively simple:  Instead of taking the road to the paint can, take the paint can to the road! … or in other word, scanning the string in place.

It is a relatively simple change to make, yet, it takes the function from O(n2) to O(n). A simple test, loading Winapi.Windows.pas as input and replacing “A” with “B” would take  about 2 minutes to run with the original code, but less than 10 msec when scanning in-place.

If you are curious about the new implementation for StringReplace, I made my implementation available in a unit that automatically replace the implementation from System.StrUtils. You will need :

  • ab.Patch.System.SysUtils.pas
  • ab.MemUtils.pas

If there’s another function that’s you guys would like me to take a look at, let me know in the comments.

Culture shocks

Like many discipline, programming is a team one. Yes, you can occasionally encounter a programmer earning a living while programming in his basement all alone. But since a lone programmer can only achieve so much on his own, most sizable projects require many programmers to cooperate.

Working with others always has it’s own set of challenges. Personality, culture, religion and many more can come as obstacles in teamwork. As programmers, even thought process can causes conflicts.

Not so long ago, I encountered a function written by a coworker that baffled me. It was only 7 lines of code, but it took me a moment to understand what it really did. I eventually went to ask the original programmer his explanation of his implementation decisions (Added as code comment, the original code had no comments).

function GetSurObj(ASurObj : TSurObj; AObj : TObject) : TSurObj;
begin
  //First, I initialize my Result
  Result := nil; 
  //Then I validate the input parameters
  if (ASurObj = nil) and (AObj = nil) then
    EXIT;
  //if the ASurObj = nil but AObj isn't, try to find a proper ASurObj 
  if (ASurObj = nil) and (AObj <> nil) then
    Result := FindSurObj(AObj)
  else
    // if we get here, it means ASurObj <> nil
    Result := ASurObj;
end;

When I finally understood the actual use of the function, I couldn’t believe how complicated it ended up being. To me, the following makes a lot more sense.

function GetSurObj(ASurObj : TSurObj; AObj : TObject) : TSurObj;
begin
  Result := ASurObj; 
  if (Result = nil) and (AObj <> nil)  then 
    Result := FindSurObj(AObj)
end;

Both functions have the merit of returning exactly the same result and making the exact same validations. But it seems to me it’s a lot easier to understand the purpose of the function looking at my version where the most significant line of code is the 1st one, compared to my coworker’s version where the most significant line is the last one.

I can see the merit of initializing result and validating input parameters, but I feel that, in this situation, it just adds way too much noise to a very simple function.

What do you think? Leave a comment about which implementation you prefer and why.

One of the caveat of high level programming

After I decided to study computer science, but before I actually started classes, I was scared. I was scared computer science (programming mostly) would be way too hard for me.

Part of that scare came from my total lack of knowledge on the subject. All I knew about EXE files was what I learned opening those in a hex editor.  I started thinking programming was about writing “stuff” in hexadecimal, and if aligned properly inside a file, magic happened! Thankfully, it is not the case. If only I had known what a compiler was back then…

Compiler are powerful tools that allows to transform instructions in plain text into machine code. The use of compilers has many advantages. They can hide the details of the hardware. A new CPU comes around with new, better performing instructions for a specific task? As soon as an update to the compiler is available you can simply update the compiler and rebuild your project without altering a single line of code and you can take advantage of the new instructions. There’s more than 1 sequence of instructions that can do what you ask on the hardware level?  Well, the compiler knows which one (or should know) performs it best.

But compilers can create quite a few “problems”. One of the problems that arise from compiler technologies is that pretty often, programmers are not 100% aware of what the machine does “behind the scene” given some lines of codes.  The Delphi language is especially rich in “implicit” stuff going around. Notably, string/array management.

When I started to work with the Exception class, one of the thing I wondered was, what was the point of its constructor CreateFmt. I was asking myself “What’s the difference between those 2 lines:”

  raise Exception.Create(Format(SSomeConstant, [1]));
  raise Exception.CreateFmt(SSomeConstant, [1]);

It took me many years before I stumbled upon information that allowed me to extrapolate the reason for the existence of CreateFmt. Granted, the difference between the 2 isn’t meaningful for most(99%) intent and purpose.

The reason why the 2 exists is partly because of the compiler doing a lot more than we know about. Lets take an example

procedure CheckValue(AValue : Integer);
begin
  if AValue > 10 then
    raise Exception.create(Format(SValueTooHigh, [AValue]));
end;

what you are really doing is

procedure CheckValue(AValue : Integer);
var ImplicitStringVariable : string;
begin
  ImplicitStringVariable := '';
  try
    if AValue > 10 then
    begin
      ImplicitStringVariable := Format(SValueTooHigh, [AValue])
      raise Exception.create(ImplicitStringVariable);
    end;
  finally
    DecRefCount(ImplicitStringVariable);
  end;
end;

while using Exception.CreateFmt really do only

procedure CheckValue(AValue : Integer);
begin
  if AValue > 10 then
    raise Exception.createFmt(SValueTooHigh, [AValue]);
end;

Of the few tests I’ve made, Using Exception.CreateFmt instead of Exception.Create(format) made the function about 8 times faster when no exceptions where raised. In situation where performance matters, it’s quite a difference. (ok, in situation where performance matters, exceptions wouldn’t be used 😉 )

Moral of the story, the higher level the language, the more things happening implicitly in the background. And those things doesn’t always make sense.  This video express it better than I could ever do.

Our first idea is rarely the best

This holds true for many aspect of life. But in programming, it’s nearly a constant. Even for very simple task,  it is hard to come with the very best solution right away.

Lets take, for example, determining if a number is prime or not. The definition of a prime number is (according to wikipedia):

A prime number (or a prime) is a natural number greater than 1 that has no positive divisors other than 1 and itself.

Based on the definition, the first implementation that comes to mind for most people (including myself) is the following.

function IsPrime(AValue : Integer) : boolean;
var I : Integer;
begin
  Result := True;
  for I := 2 to AValue - 1 do
  begin
    if AValue mod I = 0 then
      EXIT(False);
  end;
end;

This is a valid implementation. It has quite a few flaws, for example, all numbers lower than 2 will be listed as primes. Lets assume the contract of the function states AValue needs to be a positive integer larger than 1, we can now say the function gives the right answer all the time. But, what is even better than getting the right answer? Getting it FAST!  And how do we do this? Well, one of the lesson I learned from Michael Abrash’s book Zen of Code Optimisation, –“Know your data!”.

Here, our data is numbers and number properties. The property we need to observe here is the symmetrical nature of number’s divisors. Lets start with a concrete number: 30.  Its divisors are :

30 – 1, 2, 3, 5, 6, 10, 15, 30

What we can observe here is : if wemultiply the Nth term from the beginning of the list and multiply it by the Nth term from the end of the list, we always get 30 . What lies straight in the middle of the list? The square root of the number we are observing.

30 – 1, 2, 3, 5,(√30), 6, 10, 15, 30

The conclusion here is : For any divisors of X greater than X’s square root, there is also a lower divisor. In other word, if we don’t really need to divide 30 by 15, since dividing 30 by 2 does the same test. With that new knowledge in mind, we can now improve our IsPrime function.

function IsPrime(AValue : Integer) : boolean;
var I : Integer;
begin
  Result := True;
  for I := 2 to Trunc(Sqrt(AValue)) do
  begin
    if AValue mod I = 0 then
      EXIT(False);
  end;
end;

So, we went from AValue – 2 divisions all the way down to √AValue – 1 divisions. There certainly isn’t any other optimization left, right? As a pure standalone function, that’s pretty much as far as we can get. But it is still possible to go further than this, depending how much memory we want to commit to the task.

With our latest revision of IsPrime, we start by dividing by 2, then by 3, then by 4… Wait!  If 2 doesn’t divide our number, 4 certainly won’t… Why do we test for 4? The main reason is, we don’t “know” 4 isn’t a prime number. If we knew it, we would know we already tested 4 indirectly through one of it’s factor(in this case, 2). Computing this information every time we call the function would be pretty process intensive.  But… keeping a list of all the primes would take lot of memory and would also take a while to compute, no?

Actually, no! Remember, we only need to have a list of all the primes up to the square root of the number we want to test. That means that, to test the largest possible 32 bits unsigned integer(4294967295), we only need the list of all primes number from 2 to 65536. Spoiler alert!  There is only 6542 primes in that range. Those can be computed in 10s of msecs on modern day computer and only take 13 kB of memory if stored as words. (We can go as low as 8 kB if we store them as an array of bits, but it would be a lot less efficient to work with). Now, what does our code looks like?

(Full code example with comments will be available for download soon)

type
TPrimes = class abstract
strict private class var
 FPrimeList : TList<Integer>;
 FHighestValueTested : Integer;
 class procedure LoadPrimesUpTo(AValue : Integer);
strict private
 class constructor Create;
 class destructor Destroy;
public
 class function IsPrime(AValue : Integer) : Boolean;
end ;

class function TPrimes.IsPrime(AValue: Integer): Boolean;
var I, iValueRoot, idx : Integer;
begin
 if AValue <= FHighestValueTested then
   Result := FPrimeList.BinarySearch(AValue, idx)
 else
 begin
   Result := True;
   iValueRoot := Trunc(Sqrt(AValue));
   LoadPrimesUpTo(iValueRoot);
   if not FPrimeList.BinarySearch(iValueRoot, idx) then
     Dec(idx);
   for I := 0 to Min(FPrimeList.Count - 1, idx) do
   begin
     if AValue mod FPrimeList.List[I] = 0 then
       EXIT(False);
   end;
 end;
end;

class procedure TPrimes.LoadPrimesUpTo(AValue: Integer);
var
 I: Integer;
begin
 for I := FHighestValueTested + 1 to AValue do
 begin
   if IsPrime(I) then
     FPrimeList.Add(I);
   FHighestValueTested := I;
 end;
end;

In this example, I dynamically grow the list as needed. So if we don’t need to test very large numbers, we use less memory. So, to test if 4294967295 is prime, we went from up to 4294967293 divisions down to a maximum of 6542 division. I think we did good job on this!
(On a second thought, it can be divided by 5, so not that much of a gain for that specific number! 😉 )