Shlemiel the painter strikes again

Once upon a time, I was introduced to Shlemiel.  Who is Shlemiel? For those too lazy to look up the link, he’s the guy from this joke :

Shlemiel gets a job as a street painter, painting the dotted lines down the middle of the road. On the first day he takes a can of paint out to the road and finishes 300 yards of the road. “That’s pretty good!” says his boss, “you’re a fast worker!” and pays him a kopeck.
The next day Shlemiel only gets 150 yards done. “Well, that’s not nearly as good as yesterday, but you’re still a fast worker. 150 yards is respectable,” and pays him a kopeck.
The next day Shlemiel paints 30 yards of the road. “Only 30!” shouts his boss. “That’s unacceptable! On the first day you did ten times that much work! What’s going on?”
“I can’t help it,” says Shlemiel. “Every day I get farther and farther away from the paint can!”

We can all agree Shlemiel wasn’t working very efficiently! Spotting inefficiency in tasks in the physical world is pretty easy because it’s very concrete. Doing the same in the programming world is a lot harder since we don’t usually observe the execution of our code, we usually only observe the end result. It’s very easy to copy Shlemiel’s without even realizing it.

Us, Delphi programmers, we can find example of Shelmiel’s work pretty close to home. There’s a few of those in Delphi’s VCL/RTL. The one I’m going to introduce today is StringReplace.

StringReplace is notoriously slow for large string, and after profiling the function to figure out why it is so slow, I’m pretty amazed Embarcadero didn’t bother to fix its implementation as it is very simple to fix.

The source of the problem lies here :

Offset := AnsiPos(Patt, SearchStr);
if Offset = 0 then
Result := Result + NewStr;
Result := Result + Copy(NewStr, 1, Offset - 1) + NewPattern;
NewStr := Copy(NewStr, Offset + Length(OldPattern), MaxInt); //This line is a problem
if not (rfReplaceAll in Flags) then
Result := Result + NewStr;
SearchStr := Copy(SearchStr, Offset + Length(Patt), MaxInt); //This line is a problem

The 2 flagged line take together more than 90% of execution time when called with a large string as input. Now, what happens here is this :

Shlemiel realized how inefficient his methods were. So the next day, shlemiel decided to transport to the whole road to his paint can. That way, he would need to walk a lot less to paint the road!

Everytime the pattern is matched, they make a whole copy of NewStr and a whole copy of SearchStr. This makes the algorithm run in O(n2) time. Fixing this is relatively simple:  Instead of taking the road to the paint can, take the paint can to the road! … or in other word, scanning the string in place.

It is a relatively simple change to make, yet, it takes the function from O(n2) to O(n). A simple test, loading Winapi.Windows.pas as input and replacing “A” with “B” would take  about 2 minutes to run with the original code, but less than 10 msec when scanning in-place.

If you are curious about the new implementation for StringReplace, I made my implementation available in a unit that automatically replace the implementation from System.StrUtils. You will need :

  • ab.Patch.System.SysUtils.pas
  • ab.MemUtils.pas

If there’s another function that’s you guys would like me to take a look at, let me know in the comments.