I've been wondering about more capable searching e.g. auto generating a search for
FormTemplateSystem, which would require AND capability.
In an idle moment I changed
Search.pm so it could search using pure Perl (with topics fully read by Store routine). Conclusions and thoughts:
- Code change was very simple
- On my home test TWiki installation (Windows 2000) the performance was not noticably different to using
grep. Only big Web was TWiki with both seaches taking 2-3 seconds.
- Was trivial to then add AND searching and search that covers both topic name and topic content.
- Perhaps this isn't too suprising as matching in Perl is well optimised.
- This implementation could be made faster by matching each line as it's read and abandoning file read once search matches, but this will only help were there are many topics that match.
I've coded so that if egrep/fgrep value from TWiki.cfg is false, then switches to Perl based search.
Shall I add to the core? If so and everyone finds it as fast as grep then we could drop the grep dependency.
--
JohnTalintyre - 23 Jun 2001
Would be excellent if there is no noticable performance hit! What kind of commands to you specify for AND search?
Could you do some timing with
writeDebugTimes? Keep in mind that the biggest performance hit with the current search is not
grep (one system call), but the external rcs calls (hundreds of calls); and those will go away once we get it from the topic meta data. It would be interesting to know the timings of the
grep search vs internal Perl search with rcs calls disabled (for tests change Store routine to return the same dummy default values for version and author).
In case there is no noticable performance hit we can put it into the core. In that case we should do the search in TWiki::Store so that later on a different back-end can be created without impacting the rest of the code. The search function in STore should return a list of hits with the topic info (topic name, timestamp, version, author) so that the file needs to be opened only once.
--
PeterThoeny - 23 Jun 2001
I'll try and do some timings over the next coule of days. With version and author in meta tags there's no
RCS hit (I think). An advantage of upgrading all topics to meta format, rather than waiting for individual saves. For AND I simply looped around a match
//, breaking out if there was a match.
--
JohnTalintyre - 24 Jun 2001
Oh well, had to be too good to be true.
On a Solaris box I found
grep about 10 times faster than
perl for a large Web.
So only plus point is easier to change functionality and could be useful as an option to save people getting grep working on Windows.
--
JohnTalintyre - 27 Jun 2001
- I´ve never had a problem getting Grep to work on Windows
- I might really want AND type functionality and when I do so I might be willing to wait.
How about making it part of the
AdvancedSearch capability?
--
MartinCleaver - 27 Jun 2001
Even though I still want to take a stab at an
AdvancedSearch implementation, and I'm convinced that it should probably be the next milestone after the
TWikiReleaseSpring2001 is out (but I have no time to put into that in the near future), here are my two cents.
An easy way to have the
AND /
OR capability even for this up-coming release, is to use repeated greps on each of the ANDded terms. If the performance hit wrt Perl is as you say, that would reduce it by a factor of 5 for most searches.
--
EdgarBrown - 27 Jun 2001
The AND searching is now implemented as a separate Perl script called
andgrep, easily used with any TWiki version. See
CategorySearchForm for more.
--
RichardDonkin - 17 Jul 2001