SearchEnhancementsWithAndOr < Codev

Moved KevinKinnell's comments from SearchEnhancements -- PeterThoeny - 29 Aug 2000

The search string is the logical place to put these enhancements; within that we have some options to think about.

First off, I do not like the idea of using Perl regexs. They would be very difficult to untaint, and if someone slipped a ``/.../e'' through it would blow a big hole in security. So, I think we need to parse out ``simple'' regexs (like character classes) logical connectors and directives, e.g. [A-Za-z0-9], AND, OR, NOT, FOLLOWEDBY, SOUNDSLIKE, etc.

We need to figure out the syntax for this. As an example

... "blue AND -green" ...
- or -
... "blue AND NOT green" ...

which is better? If we are using logical connectives, should we be able to specify order of appearance? Can we say "... blue FOLLOWEDBY green ..."

Would it be good to be able to specify two search strings, one for topic and one for text? If we do that we have huge search power at the cost of some more protocol decisions (and we probably need to start keeping track of the order options are seen in.)

topicstr="blue AND green" topiccasesensitive="on" topicreqex="off" textstr="red AND NOT yell(ow|er)" textcasesensitive="off" textregex="on"

If we implement this, do we only search for the textstr in topics that matched the topicstr? (If we implement this, we will end up needing a form to help users construct inline search tags...Ha ha, only serious.)

This is database-query level power, and the cost is the parsing. The internal ``grep'' is pretty easy in Perl...but the more we do, the bigger/slower-to-load/slower-to-run wikisearch.pm gets...

...unless we implement a twiki search server that runs as a daemon or service...hmmmmm...nah.

-- KevinKinnell - 06 Jun 2000

AND / OR search has been also discussed in SearchEnhancements and SearchEnhancmentsRFC, certainly a useful enhancement. I like the Altavista style search because it is easy to remember and fast to type:

`+word +"consecutive words" -exclude`

+word +"consecutive words" -exclude

-- PeterThoeny - 29 Aug 2000

For those who need the AND search functionality on a Unix box, I can offer a quick and easy solution.

First, get the tool agrep from ftp://ftp.cs.arizona.edu/agrep; the tool can be used freely for any purpose, redistribution is allowed on nonprofit basis. On my Linux system, it compiles out of the box.

Change your local bin/wikicfg.pm (or lib/TWiki.cfg, if you are using the beta version) as follows:

$egrepCmd         = "/usr/local/bin/agrep -d ZZZ";

Change the path to what your system needs. It is important to change the ZZZ to some pattern that is not used in any of your documents. agrep searches on a per record basis and a record is delimited by what you set with the above option. Thus, if the ZZZ does not appear in your file, the whole file is treated as one record. Usually you want to find entries, where two patterns are found in the same file, but not necessarily in the same line.

When using the advanced searching, choose 'regular expression' search (this will invoke agrep). The search pattern

pattern1;pattern2

means pattern1 AND pattern2,

pattern1,pattern2

means pattern1 OR pattern2.

For more information about what patterns are allowed, consult the included mannual page agrep.1.

-- GuidoOstkamp - 16 Jun 2001

I had a go at agrep, thanks for the pointer. It's fine for searching for keywords, but you can't use the AND feature (actually the -d option) and regular expressions, so I implemented a grep variant that does REs as well as AND. See CategorySearchForm for more details and (soon) an upload of the script. It works quite nicely - I just use pattern1;pattern2 like agrep.

-- RichardDonkin - 08 Jul 2001

TopicClassification:

FeatureEnhancementRequest

Topic revision: r3 - 2001-07-08 - RichardDonkin

Account
- Log In
- Register User

Edit
Attach

Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.