For categorized webs like a
TicketWiki it would be very useful
to do complex searches on categories.
The form should be created directly from the catitems template.
For example, the Know .
WebSearch would have sth like this:
Proposed rules for converting the catitems to a search / form:
- make selects, radiobuttons and checkboxes into checkboxes
- let text fields as is, possibly allowing RegularExpressions
- OR the checkboxes of each category, i.e. checking OsHPUX and OsSunOS yields pages for either Sun OR HP
- AND the all selected categories, i.e. checking OsHPUX and PublicFAQ yiels only HP FAQs
- ignore empty categories
Possibly, we could prefix each category with boolean operators like
On the other hand, usability studies
(see
http://www.useit.com/alertbox/9707b.html
) show consistently,
that most users do
not really understand boolean searches.
The first model works like the professional ticketing tool from
http://www.quintus.com
I happen to know.
On yet another hand:
the target audience for a such a mechanism is not a user seeking
for a specific page, but rather an expert, who likes to keep
an overview over huge numbers of topics.
So we might add it, if it is easy to implement.
So what do you think? Does that make sense?
--
PeterKlausner - 08 Jun 2001
(btw - won't the variable
TWikiGuest change with each
subsequent save?)
Actually %WIKIUSERNAME% changes with each view.
--
NicholasLee - 08 Jun 2001
I'm implementing some
FormTemplateSystem pages that use embedded searches a lot - while there is no user interface as suggested on this page (often called query by forms or query by example in the DB world, by the way), I am going to be implementing AND searching. This will be for embedded searches only, and will require use of regular expressions, but will let me do things like find pages where dUmMy= Approved and SubjectArea = Security. The idea is that navigating to the SecurityArea page will show you all Approved and Draft entries in that area.
I am planning to allow additional attributes for embedded searches, using the
and attribute for the first additional criterion, then
and2 and so on. These will be implemented using a series of
egrep -l commands, probably combined with
xargs (which should be used in any case as it's possible for the current code to blow up with very large numbers of topics, by exceeding the allowed size of arguments for a Unix/Linux command). This is not elegant but it should work, and the performance hit should be fairly reasonable.
I did have a look at
agrep, mentioned in
SearchEnhancementsWithAndOr, which promised to let me do this with no Perl hacking. However, the use of agrep's
-d option (required to match AND terms not on the same line) doesn't work with regular expressions.
agrep does look nice for non-forms use of searching with AND/OR.
--
RichardDonkin - 08 Jul 2001
I've now implemented AND searching as a separate Perl script called
andgrep - it uses agrep-style syntax but is intended only for use with TWiki. This should be usable with any TWiki release, just by installing
andgrep and pointing the
$egrepCmd setting of the
TWiki.cfg or
wikicfg.pm file to this command. Unlike
agrep, you can use regular expressions and there are no limits on pattern length when using AND.
I've only been using this script for a few hours, but it's very simple and seems to work OK. I'll attach it here when I get into work, as I'm having a few VPN problems at the moment. It relies on 'egrep' being in the path, so you may need to edit the script and/or use a new variable in TWiki.cfg if it isn't.
It's particularly useful with embedded searches - you can now search on field1 AND field2, e.g. this search in a web modelled on
TWiki.Know (
regex=on is required, space inserted after '%' because of problems with
<verbatim>):
% SEARCH{ "AnswerStatus.*<br><\/td><td>.*Approved;QuestionArea.*<br><\/td><td>.*SecurityArea" regex="on"}%
UPDATE: Now attached as 'andgrep' - have been using this on live TWiki today without problems. Now handles topic-name searching with regular expressions. I'm using the March beta but it should be useable with other releases as well.
--
RichardDonkin - 08 Jul 2001
Great idea Richard to decouple this completely from the TWiki core! Did you do some performance testing?
--
PeterThoeny - 11 Jul 2001
Good idea to have seperate from the core for now. But, shouldn't we consider adding this code to the core in a future release? Reasons:
- Clean interface to storage system
- Documentation clearly specifies capability
- Any plugin or embedded search can assume this functionality is present
--
JohnTalintyre - 12 Jul 2001
I haven't done any performance testing, but it seems reasonably fast albeit on a very small set of topics. In the worst case, for N regular expressions which occur in
all topics, e.g. 'and;the', the search time will be N times greater than a single RE, since it runs
egrep once for each keyword, and the set of filenames scanned will not go down much one pass to another. However, this is quite unlikely - if you search for two reasonably common terms (say first hits on 30% of topics and second on 10%, and you have 1000 topics), the first pass will return 300 filenames, and the second pass will search those 300 files again. The I/O should not be a big problem given a reasonable sized filesystem cache, so the total CPU overhead in this case is 30% over the single RE case. If the less common keyword was searched for first, performance would be better, since it would search only 1100 documents. If you are doing embedded searches, it's not too hard to optimise the order of keyword terms.
As for putting this in the core - sounds like a good idea in the longer term, rather than forking another Perl interpreter. I just kept it separate for ease of debugging. It would also be good to enable AND searching through the forms without using REs - in fact, in some ways it should be the default for multiple word searches:
-
jim fred should turn into the /jim;fred/ RE, doing an AND search
-
"jim fred" should turn into the /jim fred/ RE, doing a phrase search
RE based searching should probably use the full syntax. Using ';' is not a bad idea since agrep already uses this.
It would be good to support agrep as well, as an alternative grep - this may be better for non-RE AND searching since it does a single pass.
I'd be interested to hear other people's experiences with andgrep - as long as your egrep is in the path, you can try it out immediately without altering TWiki code, just update TWiki.cfg (Dec 2000 onwards) or wikicfg.pm (May 2000).
--
RichardDonkin - 12 Jul 2001
A couple of observations:
- Doesn't seem to work on Windows - I haven't found out why yet
- Any idea how to pass ";" to the
search script in a URL? ";" is as much or a seperator as "&", so can't be used directly in a value.
--
JohnTalintyre - 13 Jul 2001
The Windows problems may be related to this line in the script, which assumes Unix line endings:
# Get rid of the \n after every filename
chomp(@filenames);
You may need to play around with getting rid of \r\n if that's what the backticks return on Windows. Or it could be something else, try turning on the debug.
As for semicolons, you need to use %3B, the URL encoding for semicolon, as mentioned
here
and in
RFC 1738 on URLs
. I just tried doing a search from
WebSearch with the March beta, typing in 'foo;bar', and it worked fine, just like the embedded searches, so you'll only need this if you are constructing your own URLs in Perl.
--
RichardDonkin - 15 Jul 2001
Richard - thanks.
I've added your code directly into my copy of Search.pm - only a few lines change. I'm happy to put into the CVS when Peter gives the green light. However, I think it would be better to be compatible with most search engines and switch, as some people have suggested, to
space being the
and separator, with literals enclosed in double quotes.
--
JohnTalintyre - 18 Jul 2001
I agree with the idea of space meaning AND, etc, and in fact suggested this above

Would be good to see this in the core since it's a fairly low-impact change. However, using space to mean AND is not backwards-compatible, of course, so we'd need a bit of discussion to see how non-embedded search usage might be affected.
--
RichardDonkin - 19 Jul 2001
I am not sure if this was possible some other way, but I have modified the andgrep to allow a search to use AND NOT. I needed this for a ticket system internally.
You simply include an '-'at the beginning of the term you want to NOT find:
% SEARCH{ "AnswerStatus.*<br><\/td><td>.*Approved;-QuestionArea.*<br><\/td><td>.*SecurityArea" regex="on"}%
The change to andgrep is:
if ($simplePattern =~ /^\-/i) {
$simplePattern =~ s/\-//;
$doInvertMatch = "-L";
} else {
$doInvertMatch = "";
}
$cmd = "egrep $doIgnorecase $doListfiles $doInvertMatch '$simplePattern' @filenames";
Thought it might help someone, somewhere.

Any suggestions on a better way to implement this?
--
AdrianLynch - 20 Nov 2001
The following change fixes an error if no results are found. It also changes the 'not' character to
\! Used this change successfully on our intranet for job tracking.
if ($simplePattern =~ /^\\\!/i) {
if (@filenames) {
$simplePattern =~ s/\\\!//;
$doInvertMatch = "-L";
} else {
die; # No files found to filter.
}
} else {
$doInvertMatch = "";
}
$cmd = "egrep $doIgnorecase $doListfiles $doInvertMatch '$simplePattern' @filenames";
--
AdrianLynch - 30 Jan 2002
Thanks for posting the updates - I missed these due to lack of
ConversationTracking 
In any case, the form-based web where I was using this didn't really get used due to staff changes...
You might also want to look at GNU
bool, linked at end of
SearchEngineVsGrepSearch - this implements AND, NOT and proximity searching, and works fine with TWiki (just change the $egrepCmd etc to
bool). Probably faster than
andgrep, but requires a C compiler (try
CygWin if on Windows).
--
RichardDonkin - 09 Feb 2002
Just wanted to add my support of
andgrep. After finally reaching the conclusion that searches can't span lines (this could be better documented - particularily regarding usage of webforms), and just before giving up on being able to use webforms, I discovered
andgrep.
Now we can use webforms and searches to generate the desired index pages.
One of the dozen or so Webs that we're implementing is Software. Within this Web there will be dozens of unique projects. I wanted to use a
TopicClassification similar to the TWiki's
WebForm, but needed to add another field to identify the project. Now to generated bug summaries, etc., unique to each project, the SEARCH needs to span multiple lines. This isn't possible with TWiki out-of-the-box.
andgrep works beautifully for this purpose.
If for no other reason then to support better use of webforms, TWiki should offer multi-line searches.
--
MartyBacke - 18 March 2002
I've changed this to a
FeatureUnderConstruction. I have built search using ";" to do "AND" into my copy of TWiki.pm and am almost ready to upload to CVS.
--
JohnTalintyre - 18 Mar 2002
Yes, lets take the AND functionality into the core code. Probably better in TWiki::Search so that it works from a search form and also from embedded search.
Compatibility is probably no issue for regex search, the chance that existing text already uses ";" is low.
For non-regex search it would be nice to have search engine type syntax, e.g.
"good food" +Sushi +Hamachi -Maguro. The question here is compatibility with existing text. A possible soluion is a new switch that enables this type of syntax.
A
QueryByExampleSearch would be nice too, in fact I have a need for that at work.
--
PeterThoeny - 22 Mar 2002
Another small enhancement:
SearchScriptWithFormattedSearch
--
PeterThoeny - 22 Mar 2002
I've now added the ability to have and in a regexp search to CVS -
SearchWithAnd. And being represented by ";" as discussed above.
--
JohnTalintyre - 23 Mar 2002
I've noticed a security hole in andgrep. Since it doesn't check for ' in the search string, you can terminate the grep part of the command and then execute arbitrary commands on the host machine. I haven't managed to get this to do anything other than give me a word count (wc) on the Wiki files in the 10 minutes I've been playing with it, but I'm pretty sure you could do about anything you wanted that the web server's user can do with a little patience. I'm updating my andgrep script to change ' to . for now; I'll look into a more robust solution and post it here if someone hasn't already posted a better one before I get mine.
To test the hole, try running a
WebSearch on
'; wc '
(you must enter the ' as part of the search string) with regex = "on" on any TWiki with andgrep installed.
--
BobbyMartin - 30 Oct 2002
Nope, the security hole seems to be a general TWiki hole. I'm running the Dec 2001 release. I turned off my use of andgrep and the wc hole remains. It appears to be fixed in the code running here on twiki.org, though.
--
BobbyMartin - 30 Oct 2002
I do not understand why ' should be a command separator. Could you explain?
--
PeterThoeny - 31 Oct 2002
the andgrep script has been superceeded with
andgrep based functionality that is now included in the
TWikiAlphaRelease, so use that instead if you can.
--
RichardDonkin - 01 Nov 2002