Tags:
create new tag
, view all tags

Feature Proposal: Delegate More Processing To Search Algorithm

Motivation

During my development of the Kino Search Algorithm in SearchEngineKinoSearchAddOn, it becomes incredibly obvious that the TWiki core needs to delegate more choices to the Search Algorithm.

This work may be interwoven with some of the ResultSet and ExtractAndCentralizeFormattingRefactor work.

Description and Documentation

In TWiki 4.2.2, when SEARCHs happen, we call a very naive pluggable function once per web -

SearchAlgorithm::search ( $searchString, $topics, $options, $sDir, $sandbox, $web )

where $options only contains scope, type, casesensitive, wordboundaries, and $topics (painfully) created list of topics.

This function then returns a hash of topic name to 'extract', which the Search rendering then throws away, keeping only the topicname list.

SearchEngineKinoSearchAddOn (As can the Xapian Engine I'm working on) can return (incredibly quickly) all the meta information for the topic, including a contextual extract, and to add to that, can return non-topics - attachments and other external data, which I would love to use.

Impact

WhatDoesItAffect: API, Performance, Refactoring, Search

Implementation

So: I propose to refactor the TWiki::Store::SearchAlgorithms and TWiki::Store::QueryAlgorithms API's (which I understand only Crawford and I have worked with please pipe up if I've missed you to :
  1. bring them into one API, where multiple SearchAlgorithms can register themselves as capable of processing a search type (or list of types)
  2. create the UI elements to dynamically add support for enabled 'types' in the WebSearch topic (so we can have attachment, external doc, google search) checkboxes
  3. pass the SearchAlgorithms all the known settings that might allow it to optimise a query (including the format string)
  4. use any information that SearchAlgorithms return in the output rendering, thus leveraging advanced improvements

for backwards compatibility, the currently existing search types and scopes will be required to return identical results as in previous versions of twiki. This implies that scope=all will not in fact search all data types, but rather only topicname and topic text.

-- Contributors: SvenDowideit - 19 Aug 2008

Discussion

Great Initiative, Sven!!!

From my studies about twiki performance, I realized that search and store are the worst bottlenecks. I was planning to try out Xapian (it seems to be very fast).

TWiki-5 will fly smile

-- GilmarSantosJr - 19 Aug 2008

Sounds excellent, Sven. The devil is in the detail; it sounds like you will be doing a lot of refactoring in Search.pm (to get rid of those topic lists, for a start).

Ideally I'd like the API fixes to climb higher up the tree so that I can perform multiple-web searches with one call; though that may be a refactoring too far.

-- CrawfordCurrie - 19 Aug 2008

It would be so cool to make it a modern interface using iterators over result sets. I can imagine that most of the current Search.pm simply goes over the fence.

-- MichaelDaum - 19 Aug 2008

Please remember a date in date of commitment field so the proposal app can work. Added todays date

-- KennethLavrsen - 11 Sep 2008

I am setting this to parked and no committed developer. Please feel free to flip that and own & implement.

-- PeterThoeny - 2010-08-01

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r7 - 2010-08-01 - PeterThoeny
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.