DeleteMe petition: please keep this page for historical reasons -- DanielKabs - 14 Apr 2005
Features are implemented (
SearchScopeForTopicAndText,
KeywordSearchWithImplicitAnd) and content no longer relevant.
These same topics are covered in
SearchDoesNotWorkAsExpected. --
ArthurClemens - 16 Aug 2003
SearchSuggestion
This topic mentions 3 improvements to TWiki search performance and usage. One solution is offered, the other 2 are still pending AFAIK.
Problem
Context
Finding and accessing stored information is a substantial part of
Twiki. The "search" element provides that functionality.
In my opinion its implementation is rather poor and
impairs Twikis usability.
Example
Try for yourself: Enter "search
text". If you have javascript enabled the words are automagically filled in. The plain text search (which can be included on any
page by cutting&pasting a bunch of strange
HTML code)
should (at least) return this page as a result.
But it doesn't.
Problems
The Twiki search does not search in the topic names but only in the topic bodies.
Worse, it does search for the
literal occurence of the search items.
This is
contrary to the usual syntax of popular search engines,
which would be to search for documents which contain all keywords.
Usually, one uses quotation marks to indicate a phrase search.
Solution
I therefore recommend:
- Plain text search should search both topic names and bodies.
- Any text search should search for every word entered and not for the exact phrase, as long as "search regular expression" is unchecked, i.e. text search should mimic the standard behaviour one is used from popular search engines.
- document the /cgi-bin/search/Codev/ CGI-script and its features: How do I use it on my own pages.
Related Issues about searching TWiki :
Update: --
DanielKabs - 12 Feb 2003,
MattWilkie - 06 Feb 2003
Discussion
Regarding Plain text search should search both topic names and bodies. Could be done with a new
scope="all"
switch (the existing ones are
scope="topic"
and
scope="text"
Other search enhancements are described in the topics you found.
The
search
script parameters are documented in
TWikiVariables, they are identical to the
%SEARCH{...}%
variable.
--
PeterThoeny - 24 Feb 2001
The listed suggestions date from last year. I wonder why your
suggestion of implementing the altavista-like search style
hasnīt managed to get into the
FeatureToDo TopicClassification.
What is the recent status in the search feature debate?
In my opinion a powerful search function is as important
as a search syntax that is familiar to users. It is
hard
enough get people to use Wiki, why not ease acceptance
by offering an altavista like search syntax?
Cheers
Daniel
Moved over from: NativeVersionFormat [ EdgarBrown - 24 Mar 2001 ]
If you make a switch to DBM, please consider how an external search engine can index / search the content.
I plan to add an external search engine to my twiki site (under construction) that would search and index the plain text files (.txt) to provide more extensive searching capability than currently provided by twiki. (Proximity searches, for example, <word_1> w/10 <word_2> (word_2 within 10 words of word
1), etc.) _Struggling (slightly) now with the order of the previous — should it be (do I prefer) "word_1 within 10 words of word _2" (the order doesn't really matter here for w/10, but if we define a within n words before (or after) option then we do need the order in the most intuitive way. -rhk
When I find the right search engine (Alta Vista personal, htdig, zyindex, google, ???), this will be easy as long as the content is stored in plain text files. As a Linux / twiki / Perl newbie, I am not sure whether this approach can be made workable if the content is stored in a database.
Aside: Why am I planning to search the "raw" content in the .txt files instead of the "cooked" html? I'm not entirely sure. I suspect some of those search engines may be able to better deal with real plain text files rather than dynamically created
HTML files. (And, I suspect some might have a problem with the "dynamic" files and others might have trouble with the
HTML.)
And, the plain text files don't have to struggle to ignore the HTML tags — ignoring the HTML tags would (usually?) be the preferred behavior. -rhk
(Of course, if DBM is just an option, and I still have the ability to choose to store the content in .txt files, I can stick with that option. Still, if the DBM format provides other advantages, it would be nice to use it.)
--
RandyKramer - 24 Mar 2001
As I understand, most search engines are nothing more than a ton of perl code that generates the indexes (normally a very time-consuming process), and then functions to use that index to generate search results.
In the case of the TWiki, we might want to use some of that Perl code, but modify it to suit our needs, for example:
most of the information on the twiki is static, but when a page is edited, we need to re-do the indexes, otherwise the searches would not work for recently modified pages. That means that we have to use some sort of continuous indexing, where whenever a page is edited the index information is updated to reflect the new file.
Thus at a bare minimum, we should be able to remove the previous page contents from the index, and add the new page contents in the index, without re-triggering a full index operation (a procedure that should be simplified by twiki's reliance on
RCS (or CVS), something that I guess most search engines won't do.
--
EdgarBrown - 24 Mar 2001
Solution 1: Plain text search should search both topic names and bodies
Solution 2: Search should search for every word entered and not for the exact phrase
Technically this is solved with
RegularExpression search (semicolon ';' for and) , but this is not very user friendly.
Solution 3: Document the CGI-script and its features
Pending...
Category:
TWikiPatches
search test blurb:
I recommend to use only small amounts of soap for cleaning the hands. Do you know if wsdl is a
web service still in use?