Idea: SEARCH With Regular Expression Sort
Spec TBD.
--
Contributors: PeterThoeny - 05 Oct 2006
Discussion
Below disucssion is moved from
AutoIncTopicNameOnSave to here.
--
PeterThoeny - 05 Oct 2006
I think that the discussion about sort order begs the real question: Rather than padding numbers out to a fixed length for the sorting...
How about extending SORT to handle numeric order?
- Much as does UNIX sort --field 4 -n
Instead of getting into field numbers, provide a way of extracting a sort key - e.g. a regexp - and then specifying a sort order on that
- e.g. s/Item\([0-9]+\).*/\1\t\&/
- and then sort --field 1 --numeric
?
--
AndyGlew - 02 Oct 2006
Sort numerically by topic name: Not sure how this can be defined in a generic & useful way with a
%SEARCH{}%.
--
PeterThoeny - 02 Oct 2006
If you support regexps
- define a regexp to extract the fields, concatenating them in order from primary through lesser keys
- concatenate using something standard - tab or the like
- this defines fields
- then specify a numeric/alphabetic sort on a field basis.
E.g. Item0-Subject, Item6565-subject
%SEARCH{ topic="Ite*", sort_regexp( s/^Item\([0-9]+\).*/\1\t\&/, field1=numeric}
--
AndyGlew - 05 Oct 2006
This cold be useful for some wiki applications, although a bit complex to use. We should find a spec that is easy to grasp and is flexible. For example, sort with regex could be on topic name, a form field value, or a regex on topic text.
--
PeterThoeny - 05 Oct 2006
Do we need to specify the regular expression? Just specify "numeric" and let the code figure it out. The numeric is really a flag saying sort any embedded numbers as numbers.
I sketched out a test program where I sort a list of items containing either prefixed or postfixed numbers (ie. item1 or 1item). The code then figure out which case and sorted accordingly.
Here is the testdata:
item1
item2
item21
item31
item3
item04
item50
item0005
item100
The Result:
item1
item2
item3
item04
item0005
item21
item31
item50
item100
The test Code:
#!/usr/bin/perl
use strict;
use Data::Dumper;
sub by_numeric {
my($res);
if( $a->[0] =~ m/^\d+$/ && $b->[0] =~ m/^\d+$/ ){
$res = $a->[0] <=> $b->[0];
return( ($res == 0) ? $a->[1] cmp $b->[1] : $res );
} elsif( $a->[1] =~ m/^\d+$/ && $b->[1] =~ m/^\d+$/ ){
$res = $a->[0] cmp $b->[0];
return( ($res == 0) ? $a->[1] <=> $b->[1] : $res );
}
} # by_numeric
sub Main {
my(@data, @split);
@data = <STDIN>;
foreach ( @data ){
$_ =~ s/[\r\n+]$//;
if( $_ =~ m/^(\d+)([^0-9]+)$/ || $_ =~ m/^([^0-9]+)(\d+)$/ ){
push(@split, [$1, $2]);
}
}
print STDOUT "in: ", Dumper(\@data), "split", Dumper(\@split), "\n";
print STDOUT "sort: ", Dumper([ sort by_numeric @split]), "\n";
print STDOUT "joined:\n", join("\n", map { join('', @$_); } sort by_numeric @split), "\n";
}
&Main();
Is this what you want? Could always be extended to handle embedded numbers if needed.
--
CraigMeyer - 06 Oct 2006
I experimented with extending with order=numeric. And spliting into non-numeric, numeric, whats-left. It seems to do what you wanted. Here are the code fragments;
sub by_numeric {
return( $a->[0] cmp $b->[0] || # 1st term non-numeric
$a->[1] <=> $b->[1] || # 2nd term Numeric
$a->[2] cmp $b->[2] # Optional 3rd term non-Numeric
);
} # by_numeric
in Search.pm "sub searchWeb" just before if( $sortOrder eq 'modified' ) add
if( $sortOrder eq 'numeric' ){
@topicList = map { join('', @$_); } sort by_numeric
map { ($_ =~ m/^([^0-9]+)(\d*)(.*)$/) ? [$1, $2, $3] :
[$_, '', '']; } @topicList;
} elsif( $sortOrder eq 'modified' ){
--
CraigMeyer - 06 Oct 2006