Schwartzian Transform Misused
I've been reviewing the code in the
Search.pm
module and I have a comment regarding the use of the Schwartzian Transform for sorting by filenames. This is the most common type of sort requested for typical searches, but the Schwartzian Transform will only serve to slow everything down.
The code is thus:
} else {
# sort by filename, Schwartzian Transform
##TWiki::writeDebug "Topic list before sort = @topicList";
if( $revSort ) {
@topicList = map { $_->[1] }
sort {$b->[0] cmp $a->[0] }
map { [ $_, $_ ] }
@topicList;
} else {
@topicList = map { $_->[1] }
sort {$a->[0] cmp $b->[0] }
map { [ $_, $_ ] }
@topicList;
}
##TWiki::writeDebug "Topic list after sort = @topicList";
}
Apparently, someone is applying the Schwartzian Transform without thinking. Let me walk you through the logic, starting with the final map:
-
map { [$_, $_] }
make a list references to arrays, each with two elements. BOTH ELEMENTS ARE THE SAME. Pass this list to:
-
sort {$a->[0] cmp $b->[0]}
sort the list of references to two element arrays using the first element. Note that this really sorts the filenames in a conventional fashion.
-
map { $_->[1] }
return the second element, sorted according by reference, but these are just the filenames.
First, consider the fact that we don't need to duplicate the filenames. Let's consider just the forward sort:
@topicList = map { $_->[0] }
sort {$a->[0] cmp $b->[0] }
map { [ $_ ] }
@topicList;
The question is: is this any slower than the simple:
@topicList = sort {$a cmp $b } @topicList;
The answer is that the simple sort is almost 2x faster, (12 seconds as opposed to 20 seconds). You can try it yourself with this simple perl snippet:
# Test of Schwartzian Transform for simple string sort
$numloops = 100;
$numstrings = 10000;
$timestart = time();
for ($i = 0; $i < $numloops; $i++) {
# First, generate a bunch of strings to sort
@topicList = ();
foreach (0..$numstrings) {
$topicList[$_] = (rand(1000000000000000)).';'; # force it to be a string, not a number.
}
@topicList = map { $_->[0] }
sort {$a->[0] cmp $b->[0] }
map { [ $_ ] }
@topicList;
}
$timeend = time();
print "elapsed time for sorting $numstrings strings $numloops times using schwartzian: ". ($timeend-$timestart),"\n";
$timestart = time();
for ($i = 0; $i < $numloops; $i++) {
@topicList = ();
foreach (0..$numstrings) {
$topicList[$_] = (rand(1000000000000000)).';'; # force it to be a string
}
@topicList = sort {$a cmp $b } @topicList;
}
$timeend = time();
print "elapsed time for sorting $numstrings strings $numloops times using sort: ". ($timeend-$timestart),"\n";
1;
The result of this test is as follows:
elapsed time for sorting 10000 strings 100 times using schwartzian: 20
elapsed time for sorting 10000 strings 100 times using sort: 12
Therefore, please change this code as follows:
} else {
# sort by filename, Schwartzian Transform NOT NEEDED HERE!
##TWiki::writeDebug "Topic list before sort = @topicList";
if( $revSort ) {
@topicList = sort {$b cmp $a} @topicList;
} else {
@topicList = sort {$a cmp $b} @topicList;
}
##TWiki::writeDebug "Topic list after sort = @topicList";
}
Since this is the typical path through the code (sort by topic name) this should speed up execution a smidget.
Note that this is not a bug or really a new feature but just smart coding.
--
RaymondLutz - 01 Nov 2003
Thanks Raymond, performance related improvements are always welcome
This is now in
TWikiAlphaRelease and at TWiki.org.
--
PeterThoeny - 02 Nov 2003
Good work. I checked the CVS version and it looks correct. I checked the time of execution for the original version (which makes a redundant copy of the list of topics) and it is a bit worse, with 21 seconds compared with 12 seconds.
Thanks for your kind assistance.
--
RaymondLutz - 03 Nov 2003
And in fact, "sort { $a cmp $b }
@list" is just "sort
@list". No need for the sort block!
--
RandalSchwartz - 22 Oct 2004
Hey, Randal, just realized that Schwartzian Transform is named after you! Welcome aboard the TWiki train to the future!
--
PeterThoeny - 22 Oct 2004