Tags:
create new tag
, view all tags

Brainstorming on AutomaticLinkSuggestionPlugin

I'm implementing a plugin to suggest places where an internal link could be inserted.

Rationale: sometimes you write a topic without knowing that some other topic could be referenced to better explain things, let the system suggest the link during Preview.

Implementation

  • for each web's topic names
    • split the topic name in words (e.g. split just before [A-Z][a-z])
    • remove stop words (single letters, the, an , a, and, ...)
  • limit the number of words to 4 to avoid HostileUsers write VeryVeryVeryLongTopicNamesJustToBreakTheToy (10! permutations)
    • consider both singular and plural word forms
    • compute all permutations of the words
    • look in the text for these permutations, case-insensitive, separated at most by 2 words (we want to overcome the missing stopwords) but not commas, stop etc, (we want to keep words together)
    • highlight the found patterns (a nice background color and a tooltip, made with a ALT string in a IMG tag, would be perfect)

Normally topic names are just a few words long, and so the number of permutations are just a fiew (N!).

Possible improvements

  • define a set of word aliases and look for more patterns

-- AndreaSterbini - 29 Nov 2000

I like this idea - it would avoid one of the problems that happens when you get to a lot of similar-named items.

I'm concerned about the performance however. Until/unless TWiki is running as a mod_perl item, anything like this that slows down the preview would be a deterrent for many users. The speed of updating and previewing when category table is present is already an issue with users I'm trying to coax to use Twiki.

-- StanleyKnutson - 07 Dec 2000

Yes, I forgot to say that the suggestion should be optional, e.g. one of:

  • a checkbox near the "Preview changes" button in the edit template
  • a button in the preview template that opens a new window with the suggestions
  • other ideas welcome

-- AndreaSterbini - 07 Dec 2000

Here is the SuggestTopicLinkPlugin ... it uses the List::Permutor pure Perl module (see http://www.cpan.org)

I enclose also a preview.suggest script and a edit.suggest.tmpl template for doing suggestions during preview ...

The package contains also a suggest script (view with suggestions ON)

NOTICE: SuggestTopicLinkPlugin requires an updated wikiplugins.pm ... get it from TWikiPluginAPI.

TODO:

  • do a better singular/plural transformation

-- AndreaSterbini - 1 Feb 2001

Moved the OPTIONWEBS variable to TWikiPreferences

-- AndreaSterbini - 02 Feb 2001

Sounds a good idea, especially on webs that lots of people are using.

-- MartinCleaver - 09 May 2001

I think that I would like to use this AutomaticLinkSuggestionPlugin not just in preview, but also during normal viewing. After all, the page with permuted names may have been created after the page that uses a different permutation.

This should be computationally too expensive - if we can do some preprocessing of the pages in the web. Rather than generating and trying all N-factorial permutations, it should be possible to create a single state machine which is the composition of a recognizer state machine for each wiki page. State explosion would be avoided by filtering first to see if all of the words in a MultiWikiWordLinkOfLengthN are present in the last N words seen; given that all the words are found, it either hits a single page, or more than one. If a single page, you are done. If more than one, you have to choose which.

Implement by creating a pipelined Bloom filter. For every word seen, compute a hash code, e.g. a 1 of N hash code, with a single bit set. (I think an M of N hash code would work, too, for small N.) Create a shift register of the hash codes of individual words; the shift register depth would be the length of the largest wiki page name, in words. Create an array of accumulators with the hash codes of the last 1..N words. As words are parsed, compute their hash, XOR it into the hash code of each of the accumulators, and add it to the shift register. As a word becomes the Nth oldest, remove it from the shift register, and XOR it out of each of the accumulators. As each word is seen, index a table that maps hash codes to wiki pages.

If there are W words in a page, and N words in the longest page name, this requires O(N) work at each word, => O(WN) overall. OK, so you would probably want to eliminate really long page names... but it's still likely to be pretty fast.

-- AndyGlew - 28 Mar 2006

Topic attachments
I Attachment History Action Size Date Who Comment
Compressed Zip archivezip SuggestTopicLinkPlugin.zip   manage 10.6 K 2001-02-02 - 11:01 AndreaSterbini A Plugin that suggests links for you ...
Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r11 - 2006-03-29 - PeterThoeny
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.