r26 - 17 May 2008 - 08:46:42 - ClifKussmaulYou are here: TWiki >  Support Web > OrderSearchResultsMostRelevantFirst
Tags:
search 1 Add my vote for this tag, , create new tag

Question

Is it possible to order the search results in order of relevance? I've spent quite a long time searching for a method, without success, so I presume not.

Obviously, "relevance" is a subjective thing, so the most relevant result from a given search term may be different for one person than it is for another. Much research has been conducted into such searches, so presumably an algorithm could be implemented that takes into account relevance - Google seem to do quite a good job (although is proprietary).

When searching for a given term, a page containing that term in its title and many times within its body is likely to be more relevant to most people than a page where the term features only once in the body.

There is some discussion about changes to the order of search results at the link below, but this is all based on either the topic name, or when it was last modified.

SearchOrderAndLimitBehavour

Environment

TWiki version: TWikiRelease04x00x03
TWiki plugins: SpreadSheetPlugin? , CalendarPlugin? , CommentPlugin? , EditTablePlugin? , EmptyPlugin? , InterwikiPlugin? , PreferencesPlugin? , ProjectPlannerPlugin? , RenderListPlugin? , SlideShowPlugin? , SmiliesPlugin? , TablePlugin? , TagMePlugin?
Server OS:  
Web server:  
Perl version:  
Client OS: MS Windows XP Service Pack 2
Web Browser: Internet Explorer 6
Categories: Search
-- AndrewWhitefield - 13 Feb 2007

Answer

ALERT! If you answer a question - or have a question you asked answered by someone - please remember to edit the page and set the status to answered. The status is in a drop-down list below the edit box.

TWiki's default sreach does not rank the results. You can spider the TWiki content with a search engine to get ranking, see Tag:search.

Look also into content tagging, the TagMePlugin shows tag search results with ranking based on the number of votes a tag gets.

-- PeterThoeny - 13 Feb 2007

Thanks, it looks like the "Google AJAX Search Plugin" will be best for us to use to get results in a more relevant order.

-- AndrewWhitefield - 19 Feb 2007

This is not difficult to do with the default search.

<form action="%SCRIPTURLPATH{"view"}%/%WEB%/%TOPIC%">
  Find Topics: 
  <input type="text" name="query" size="32" value="%URLPARAM{"query"}%" />
  <input type="submit" class="twikiSubmit" value="Search" />
  <input type="hidden" name="table" value="1" />
  <input type="hidden" name="sortcol" value="0" />
  <input type="hidden" name="up" value="1" />
</form>
%SEARCH{ search="%URLPARAM{"query"}%" nosearch="on"
   header="| *Count* | *Topic* | *Summary* |" 
   format="|  $count(.*?(%URLPARAM{"query"}%).*) | $topic %BR% $date %BR% $wikiname | $summary |" 
}%

-- ClifKussmaul - 08 Apr 2008

Thanks Clif, this is a creative way of doing simple ranking. You can use the TablePlugin to pre-sort the table on the first column that has the count.

-- PeterThoeny - 08 Apr 2008

If you spend too much time tweaking things, you can do more complex rankings...

Here's code that does a query and ranks the result by weighting occurances in the topic, top-level headings, other headings, and the body.

<!-- 
    form that links back to this page
    - hidden fields specify table, column, & direction to sort
-->
<form action="%SCRIPTURLPATH{"view"}%/%WEB%/%TOPIC%">
  <input type="hidden" name="table" value="1" />
  <input type="hidden" name="sortcol" value="0" />
  <input type="hidden" name="up" value="1" />
  Find Topics: 
  <input type="text" name="query" size="32" value="%URLPARAM{"query"}%" />
  <input type="submit" class="twikiSubmit" value="Search" />
</form>

<!-- 
    results - use URLPARAM to extract query
    - weighted: title (10), top heading (5), other headings (3), other appearances (1)
    - (?i) specifies case-independent regex
-->
%SEARCH{ search="%URLPARAM{"query"}%" nosearch="on"
   header="| *Weight* | *T* | *H1* | *Hn* | *R* | *Topic* | *Summary* |" 
   format="|\
   $percntCALC{$EVAL( \
      10 * $IF($SEARCH((?i)%URLPARAM{"query"}%, $topic),1,0) +\
        5 * $T(R$ROW():C3) +\
        3 * $T(R$ROW():C4) +\
        1 * $T(R$ROW():C5) )}$percnt |\
   $percntCALC{$IF($SEARCH((?i)%URLPARAM{"query"}%, $topic),1,0)}$percnt |\
   $count(.*?(?i)---\+[^+]*?(%URLPARAM{"query"}%)[^\n\r]*.*) |\
   $count(.*?(?i)---\+\+.*?(%URLPARAM{"query"}%)[^\n\r]*.*) |\
   $count(.*?(?i)(%URLPARAM{"query"}%).*) |\
   $topic %BR% $date %BR% $wikiname | $summary |" 
}%

<!-- button to toggle hidden columns - mostly for debugging -->
<input type="button" value="toggle hidden columns" onclick='javascript:toggleColumns([1,2,3,4]);'/>
<script type="text/javascript">
function toggleColumns(cols) {
  var table = document.getElementsByTagName('table')[0];
  var newstyle = (table.rows[0].cells[cols[0]].style.display == 'none') ? '' : 'none';
  for (var c = 0; c < cols.length; c++) {
    for (var r = 0; r < table.rows.length; r++) {
      table.rows[r].cells[cols[c]].style.display = newstyle;
    }
  }
}
toggleColumns([1,2,3,4]);
</script>

And yes, I'm being paid by hardware vendors to produce processor-intensive code. smile

-- ClifKussmaul - 09 Apr 2008

Cliff, this is great smile

On TWikiVMDebianStable it is actually quiet fast as well...

-- CarloSchulz - 09 Apr 2008

I'm still tweaking the weighted query code to fix problems and add features - we've even used it to replace the default WebSearch. The big problem is that the child $count and $SEARCH don't handle multiple keywords well, and I'm thinking it might be better to rewrite TWiki's search code (lib/TWiki/Search.pm) rather than continue to work around it this way. Any thoughts?

-- ClifKussmaul - 11 Apr 2008

btw the toggle button does not work. no reaction at all...

-- CarloSchulz - 11 Apr 2008

Carlo, can you tell me anything more - e.g. what browser do you have? (The toggle button uses JavaScript)

-- ClifKussmaul - 11 Apr 2008

To expand on my earlier comment - the main problem now is with weighting multiple keywords, quoted strings, and negative keywords. To find the query string, the code uses SpreadSheetPlugin's $SEARCH and FormattedSearch's $count, which don't (can't) convert the query to a regex the way VarSEARCH does.

My hunch is that I need to either:

  1. convert the query into a regex before passing it to anything else, so they all get the same regex.
  2. create a new Plugin based on lib/TWiki/Search.pm - more work, but probably more flexible (and readable) than what I have now.
  3. extend Search.pm, perhaps by extending the $format parameter to include a $contextcount(pre,suf) variable that counts results with given prefix & suffix, or in specific contexts (like the topic title) - this seems most parsimonious. Syntax to count in H1 headers might look like: $contextcount(---\+[^+]*?)([^\n\r]*?)

I'm leaning toward the last option. Does this make sense? Comments or suggestions, anyone?

-- ClifKussmaul - 18 Apr 2008

I've added two variables to $format in Search.pm. $searchpattern returns the search pattern from Search.pm (by join-ing the search tokens) and $countcontext returns the number of times the search pattern appears with the given prefix and suffix. For example (each column shows expressions with and without the new variables):

   format="|$percntCALC{\"$IF($SEARCH($searchpattern, $topic),1,0)\"}$percnt \
            $percntCALC{\"$IF($SEARCH((?i), $topic),1,0)\"}$percnt |\
            $countcontext(---\+[^+]*?)([^\n\r]*?) $count(.*?(?i)---\+[^+]*?()[^\n\r]*?.*) |\
            $countcontext(---\+\+.*?)([^\n\r]*?)  $count(.*?(?i)---\+\+.*?()[^\n\r]*?.*) |\
            $countcontext()()                     $count(.*?(?i)().*) |\
            $percntCALC{$SUMPRODUCT(R$ROW():C1..R$ROW():C4, R1:C1..R1:C4)}$percnt |\
            [[$web.WebHome][$web]]: $web.$topic %BR% $date %BR% %USERSWEB%.$wikiname |\
            $summary |" 

Do these seem like reasonable changes to Search,pm?

-- ClifKussmaul - 19 Apr 2008

Looks like useful enhancements. Please file a feature request following the TWikiReleaseManagementProcess. On syntax, I suggest $countcontext((pattern1)(pattern2)) to (1) retain the current convention, (2) make it safer to parse.

-- PeterThoeny - 19 Apr 2008

I've changed the syntax - thanks for the feedback. I will definitely file a feature request, one I resolve a few more issues.

-- ClifKussmaul - 21 Apr 2008

I'd like to try your latest solution but I'm not sure what to copy/change...

wrt toggle button: I'm using the latest FF browser with JS enabled.

-- CarloSchulz - 25 Apr 2008

Feature request filed: FormattedSearchPatternAndCountContext

-- ClifKussmaul - 10 May 2008

Cool, does that patch work with TWiki 4.1 and 4.2?

-- MartinSeibert - 12 May 2008

I'm using it with 4.1.2. I haven't installed 4.2 but I guess I should...

-- ClifKussmaul - 13 May 2008

I've applied the diff file from FormattedSearchPatternAndCountContext to Search.pm. What next? How do I get the feature to sort (or get sorted) search results by relevance? Which code do I have to add/change and in which file?

-- AlexanderSeith - 14 May 2008

Alexander, once you've updated Search.pm, you will need to add a new search page or modify the existing search page(s). FormattedSearchPatternAndCountContext includes sample search code and output, and I've just attached a sample AdvancedSearch.txt to it. Does this make sense?

-- ClifKussmaul - 14 May 2008

Perfect, exactly what I needed. Thanks a lot!

-- AlexanderSeith - 15 May 2008

Yippieh! Alexander (see above) implemented the patch in our wiki-implementation. That's a huge improvement. Long live TWiki and its active community! smile

-- MartinSeibert - 15 May 2008

Is it possible, to limit the results to 30 by standard. If needed, it would be good for the user to be able to enhance the list by changing the number of results.

-- MartinSeibert - 16 May 2008

I requested an implementation in the standard: ImplementSortingByRelevanceInStandard

-- MartinSeibert - 16 May 2008

Martin, VarSEARCH's limit parameter should work as normal.

I've considered expanding my AdvancedSearch page to include more options, as in WebSearchAdvanced (and maybe hiding those options with JavaScript, by default).

I guess we could also define preference variables for limit, etc.

-- ClifKussmaul - 16 May 2008

 
Change status to:
Topic attachments
I Attachment Action Size Date Who Comment
txttxt SearchByRelevance.txt manage 9.7 K 15 May 2008 - 08:47 AlexanderSeith Topic for search by relevance, slightly modified and translated into german
Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r26 < r25 < r24 < r23 < r22 | More topic actions
 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback SourceForge.net Logo