Moved
KevinKinnell's comments from
SearchEnhancements --
PeterThoeny - 29 Aug 2000
The search string is the logical place to put these enhancements; within that we have some options to think about.
First off, I do not like the idea of using Perl regexs. They would be very difficult to untaint, and if someone slipped a ``/.../e'' through it would blow a big hole in security. So, I think we need to parse out ``simple'' regexs (like character classes) logical connectors and directives, e.g.
[A-Za-z0-9], AND, OR, NOT, FOLLOWEDBY, SOUNDSLIKE, etc.
We need to figure out the syntax for this. As an example
... "blue AND -green" ...
- or -
... "blue AND NOT green" ...
which is better? If we are using logical connectives, should we be able to specify order of appearance? Can we say
"... blue FOLLOWEDBY green ..."
Would it be good to be able to specify
two search strings, one for topic and one for text? If we do that we have huge search power at the cost of some more protocol decisions (and we probably need to start keeping track of the order options are seen in.)
topicstr="blue AND green" topiccasesensitive="on" topicreqex="off" textstr="red AND NOT yell(ow|er)" textcasesensitive="off" textregex="on"
If we implement this, do we only search for the textstr in topics that matched the topicstr? (If we implement this, we will end up needing a form to help users construct inline search tags...Ha ha, only serious.)
This is database-query level power, and the cost is the parsing. The internal ``grep'' is pretty easy in Perl...but the more we do, the bigger/slower-to-load/slower-to-run
wikisearch.pm gets...
...unless we implement a twiki search server that runs as a daemon or service...hmmmmm...nah.
--
KevinKinnell - 06 Jun 2000
AND / OR search has been also discussed in
SearchEnhancements and
SearchEnhancmentsRFC, certainly a useful enhancement. I like the Altavista style search because it is easy to remember and fast to type:
+word +"consecutive words" -exclude |
--
PeterThoeny - 29 Aug 2000
For those who need the AND search functionality on a Unix box,
I can offer a quick and easy solution.
First, get the tool
agrep from
ftp://ftp.cs.arizona.edu/agrep
;
the tool can be used freely for any purpose, redistribution is
allowed on nonprofit basis. On my Linux system, it compiles out
of the box.
Change your local
bin/wikicfg.pm (or
lib/TWiki.cfg, if you are
using the beta version) as follows:
$egrepCmd = "/usr/local/bin/agrep -d ZZZ";
Change the path to what your system needs.
It is important to change the
ZZZ to some pattern that is not
used in any of your documents. agrep searches on a per record basis
and a record is delimited by what you set with the above option. Thus,
if the
ZZZ does not appear in your file, the whole file is treated
as one record. Usually you want to find entries, where two patterns
are found in the same file, but not necessarily in the same line.
When using the advanced searching, choose 'regular expression'
search (this will invoke agrep). The search pattern
pattern1;pattern2
means
pattern1 AND pattern2,
pattern1,pattern2
means
pattern1 OR pattern2.
For more information about what patterns are allowed, consult
the included mannual page agrep.1.
--
GuidoOstkamp - 16 Jun 2001
I had a go at agrep, thanks for the pointer. It's fine for searching for keywords, but you can't use the AND feature (actually the
-d option) and regular expressions, so I implemented a grep variant that does REs as well as AND. See
CategorySearchForm for more details and (soon) an upload of the script. It works quite nicely - I just use
pattern1;pattern2 like agrep.
--
RichardDonkin - 08 Jul 2001