Tags:
create new tag
, view all tags

Feature Proposal: Real Search Engine with better (not alphabetical results sorting)

Motivation

The current search results of TWiki are poor, as the sorting of the SERPs is in alphabetical order. It would be better, to have a reasonable algorithm to determine a measure for relevancy and use it as sort-criteria.

Description and Documentation

Create an algorithm for a SERP-sorting by relevancy.

Examples

  • Simply perform a search in TWiki, that has more than 5 results to understand the problem.

Impact

Implementation

-- Contributors: MartinSeibert - 03 Feb 2008

Discussion

I'm all for it smile

Possible Criterias for ranking may be:

  1. search string matches topic title
  2. page has a tag that matches the search string
  3. search string appears in the content
  4. page is referenced by X topics
  5. ...

-- CarloSchulz - 03 Feb 2008

Note that this is only needed for full text search.

We must be realistic. Data must be indexed or we will be suffering even slower results. And a good part of that is covered by initiatives like SearchEngineKinoSearchAddOn, based on Lucene. It would be silly to create a totally new implementation.

-- ArthurClemens - 03 Feb 2008

I agree with arthur. There are plenty of really nice search engines: lucene, mnogosearch, xapian. Of course they have an indexing phase, but they are suprizingly fast: I think that by just having a plugin that on topic modification would make them index incrementally only the changed pages, you could have both a high-quality and fresh search, searching also in the attachments, and showing you the hits on the page.

-- ColasNahaboo - 03 Feb 2008

This feature proposal is obviously a very desired feature from our users. But the proposal is not the kind with enough implementation detail so we can take a decision to go ahead and implement it.

But non the less we should take this requirement into great consideration when we design the new storage model for TWiki 5.0. Not only should it support fast formatted searches for twiki applications but we should also consider some simple but effective relevance algo.

I think we all want this feature. The question is not IF but HOW. IMHO.

-- KennethLavrsen - 03 Feb 2008

The actual implementation is indeed way to technical for me. But like Kenneth said, the feature itself is something all "used to google" users will expect.

To me it doesn't matter whether it is achieved via plugin (should be a default plugin then) or changes to the core. As a normal user I simply just don't care about it.

So I can help along with what's needed from the users perspective but unfortunately not from the developers perspective frown

-- CarloSchulz - 04 Feb 2008

The big problems i had with the lucene plugins is that a lot of changes are needed to TWiki to integrate them properly. For a casual TWiki admin that's a big hassle (even without the occasional TWiki upgrade).

A 'drop in place' lucene (or whatever) search plugin would be very welcomed.

-- JosMaccabiani - 04 Feb 2008

Is SearchEngineKinoSearchAddOn already a fully fuctional search engine? Could anybody post a screeshot of a results page. I would try a lot, to overcome the poor results list of today. (I do not mean to offend anybody. But a lot of our staff quizzed me with good questions on why the TWiki-search has an alphabetical order.)

-- MartinSeibert - 04 Feb 2008

But a lot of our staff quizzed me with good questions on why the TWiki-search has an alphabetical order

Same with me...

-- CarloSchulz - 05 Feb 2008

Kenneth said "This feature proposal is obviously a very desired feature from our users. But the proposal is not the kind with enough implementation detail so we can take a decision to go ahead and implement it." - What is needed, to overcome this state? Do you need a specific definition of the algorithm for sorting?

-- MartinSeibert - 05 Feb 2008

Hi Martin,

if you have a comment on a plugin or add-on, please post it in the PluginNameDev topic and not in the plugin topic itself.

On your question: The KinoSearch add-on is a "fully functional search engine" (what ever that means). You can search very fast on huge webs and (additional to the normal TWiki search) you can search in attached documents. You can also seach for values in form fields.

You can not search for wildcards or regualr expressions (that's a restriction of KinoSearch).

It is not implemented as a plugin. Thus you cannot use is in the for %KINOSEARCH%{...} like the normal TWiki %SEARCH. Further more you cannot define any formatting on the result.

The add-on comes with ready to go search forms, that you can use and / or modify to you needs. Also examples to put it in the left bar or even exchange the top search field are given.

The output is sorted on relevance: This is mainly due to the number of appearances of the search word. Additionally an appearance in the topic name is counted more.

The add-on does not automatically update the index on any change of a topic in the web. You need to start an update process, but that can be done by a cron job.

Here a snapshot of a KinoSearch result page. It looks very much like the result of a "normal" TWiki Search. In addition, the found keywords are highlighted.

KinoSearchResult.jpg

-- MarkusHesse - 05 Feb 2008

I like this screenshot. Kenneth: Why wouldn't this AddOn simply be inserted in the next update? Looks like a quick win for me.

-- MartinSeibert - 06 Feb 2008

There is a small problem with the presented search options.

  1. "Do not" and "do" are mixed. I would say "Show summaries" with a checked mark
  2. Display and search output parameters are mixed. "show summaries" is about presenting topic names with additional context, but the topics will stay the same; "show locked topics" will extend the result set. Same goes for "total matches" (which does not seem that important to show or hide on a results page) and "limit to" which is again for changing the result set. Why not put the number of found topics by default in that box?
  3. What about sort options? Are results sorted by relevance? How if I don't want that all the time?
  4. What if I just want to search in topic names?

Just to say: there are a lot of elements that can be improved to give a better user experience.

-- ArthurClemens - 06 Feb 2008

Hi Arthur,

with KinoSearch you can search for topic titles by typing a search string like topic:WebHome. Also you can search only in the topic body by text:kino. For details see SearchEngineKinoSearchAddOn#Searching_with_kinosearch.

Please let me emphasis that KinoSearch is not intended to substitute the normal TWiki search. Its only an AddOn to do this specific kind of search: Fast full text search including attachments.

If the idea from ResultSets is realised, KinoSearch could generate result sets that are rendered else where and thus things like alternative sorting and formatting of the result could be done there.

-- MarkusHesse - 07 Feb 2008

Hm ... maybe TWiki-Users should buy this then? smile http://www.google.com/enterprise/gsa/

-- MartinSeibert - 11 Feb 2008

It's not hard to sort the default results by how often the search term appears. See OrderSearchResultsMostRelevantFirst,

SearchEngineKinoSearchAddOn looks pretty cool, though..

-- ClifKussmaul - 08 Apr 2008

You can rank pages based on the number of times the word occurred in the topic text, yes, and maybe weight hits differently if it they appear in a special metadata of the topic (topic parent, metadata, topic name, topic title, keywords, ...). Don't forget to count number of backrefs. What about personalized rankings: what is relevant for you may not be for me ... based on the topics I normally visit. Top contributor's edits may be a good indicator as well to rank topics. Read more in Programming Collective Intelligence.

Just want to say: this is far from trivial - you've just hit a key aspect of web2.0.

-- MichaelDaum - 08 Apr 2008

At first a simple algorithm, that allows relevance sorting would be enough in my opinion. This could work like this: Hits in the first headline * 5 + Hits in other headlines * 3 + Hits in the body and all other inline-elements * 2 + Hits in Attachments (if possible) * 1 = relevance

What do you think about that? Too complicated?

-- MartinSeibert - 08 Apr 2008

Ideally also add a weighted number of backlinks. This clearly requires a caching solution.

-- PeterThoeny - 09 Apr 2008

Currently, TWiki does a bad job tracking backrefs. Actually it doesn't at all and that's the problem. Reworking only this (and reuse it for page ranking here) is worth a proposal of its own. Keeping track of backrefs must go into the core as some core features depend on it. Even worse, "oopsmore->find backrefs" and "rename/delete this topic" both try to find backrefs independently using an implementation of their own finding different results and using an active search, i.e. grep over all data in all webs. See Bugs:Item4212.

-- MichaelDaum - 09 Apr 2008

I expanded the script I put in OrderSearchResultsMostRelevantFirst to something similar to what Martin suggested. It also weights hits in the title. It displays results in a table sorted by "relevance", with hidden columns for the individual terms in case you want to see them.

-- ClifKussmaul - 09 Apr 2008

Cool. I will check that out some time soon.

-- MartinSeibert - 10 Apr 2008

In my opinion, the issue is solved. I requested an implementation in the standard: ImplementSortingByRelevanceInStandard

-- MartinSeibert - 16 May 2008

Topic attachments
I Attachment History Action Size Date Who Comment
JPEGjpg KinoSearchResult.jpg r1 manage 109.5 K 2008-02-05 - 11:47 MarkusHesse Snapshot of a KinoSearch result page
Edit | Attach | Watch | Print version | History: r25 < r24 < r23 < r22 < r21 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r25 - 2008-08-06 - MartinSeibert
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.