Tags:
create new tag
, view all tags

StopWikiWordLinkPluginDev Discussion: Page for developer collaboration, enhancement requests, patches and improved versions on StopWikiWordLinkPlugin contributed by the TWikiCommunity.
• Please let us know what you think of this extension.
• For support, check the existing questions, or ask a new support question in the Support web!
• Please report bugs below

Feedback on StopWikiWordLinkPlugin

First version posted. Enjoy.

-- PeterThoeny - 09 Aug 2006

Hm, maybe this should be enhanced by allowing wildcards or even regular expressions, but I'm also concerned about possible performance impacts. Doesn't this extra check slow down things?

-- FranzJosefSilli - 09 Aug 2006

Do you have a use case for wildcards or regex? I assumed that only some fixed words (such as product names and Scottish names) need to be escaped.

Allowing wildcards or even regex is not so much a performance question, more a security question. The Plugin currently filters out anything not A-Z, a-z, 0-9.

-- PeterThoeny - 09 Aug 2006

Thanks for responding to my request for the so promptly, Peter! I have noticed one small thing: if the wiki word is the first word in a paragragh, it still becomes a link.

-- LynnwoodBrown - 10 Aug 2006

OK, fixed, and new version posted.

-- PeterThoeny - 10 Aug 2006

I have another instance where one of my stop words becomes a link.

The word is "UniWorks" and it works everywhere except in the following table text:

|  *In Field*  |  *UniWorks will check that...*  |
For some reason the "UniWorks" is still a link in this case.

Any ideas?

-- DuncanKinnear - 01 Mar 2007

Ah, it looks like an interference between table heading and bold. As a workaround, write:
|  *<nop>UniWorks will check that...*  |
(the <nop> gets removed at the end of the rendering process.)

-- PeterThoeny - 01 Mar 2007

Peter, I had a look at the Plugin code and tried changing the stopWordsRE to include the '*' and '_', thus:

    $stopWordsRE = "(^|[\( \n\r\t\|\*_])($stopWords)";
and the link went away.

Is there anything wrong with doing this that you can see?

-- DuncanKinnear - 01 Mar 2007

No, this should work just fine.

-- PeterThoeny - 01 Mar 2007

Thanks Peter.

Now, I did notice when I was looking at the Plugin code that the RE is actually searching for WikiWords that start with the Stop Words defined.

This means that if you define "RedHat" as a Stop Word, then "RedHatInstallationGuide" would not be treated as a link either.

This was not what I expected from reading the Plugin documentation, and I think it is important to add this to the documentation.

Further, I'd like to extend the RE so that it matches the stop words exactly. I was thinking I could just extend the RE to:

$stopWordsRE = "(^|[\( \n\r\t\|\*_])($stopWords)([\) \n\r\t\|\*_]|$)";
and add a $3 after the $2 in the substitution later.

Would this work? Any pitfalls?

Also, could the RE be shortened by using "\s" instead of the " \n\r\t"? This would make the RE:

$stopWordsRE = "([\(\s\|\*_])($stopWords)([\)\s\|\*_])";

-- DuncanKinnear - 01 Mar 2007

Hmmm. I just tested a simpler version of the RE and the "\s" does not match the start and end of line ("^" and "$") as I had previously thought, so you would still have to retain those.

So, my final version of the RE is:

$stopWordsRE = "(^|[\s(|*_])($stopWords)($|[\s)|*_])"
(you shouldn't need to escape the brackets, pipes, asterisks or underscores as long as they are in square brackets)

What do you think?

-- DuncanKinnear - 01 Mar 2007

Test for end of WikiWord is a sensible enhancement. I would escape the special chars within square brackets for readability and to avoid issues with bugs in older Perl versions (if any).

-- PeterThoeny - 01 Mar 2007

Fair enough on the special characters. It is important to maintain that backward-compatibility.

At least you can save yourself a bit of space by using the '\s'. Or is that a fairly recent regexp addition? I know I only found out about it 3 or 4 years ago.

Aren't REs the most ugly (and yet most powerful) part of Perl? smile

-- DuncanKinnear - 01 Mar 2007

I think '\s' is safe in Perl (unlike in external grep, aka TWiki topic text search).

-- PeterThoeny - 02 Mar 2007

Another issue with this Plugin is that it adds the <nop> into bracketed links.

For example, if you had 'RedHat' as a Stop Word and you created the following link:

[[Installing RedHat from CD]]

A nop is inserted before the RedHat. This can have one of two side-effects:

  1. If the target topic does not exist, then the topic that is created when the user clicks on the '?' is: InstallingNopRedHatFromCD
  2. If the target topic exists, the link will not point to it because it is looking for a topic called: InstallingNopRedHatFromCD

So, to fix this, I added the following lines after the substitution line in the preRenderingHandler routine:

LOOP: {
    $_[0] =~ s/(\[\[[^]]*?)<nop>/$1/g;
    redo LOOP if ( $_[0] =~ /\[\[[^]]*?<nop>/ ) ;
}

I had to use the LOOP construct to allow for multiple StopWords in one bracketed link.

Any pitfalls that anyone can spot with this?

-- DuncanKinnear - 19 Apr 2007

The current regex does not terminate at the end of a link, e.g. it will remove nops after the closing brackets. Also, you should only remove the nop if part of the link, e.g. exclude the label part of links ([[link][label]]. A safer approach is to call a subroutine for the link part of links, and do a global regex in that subroutine. Untested regex, checking link part of [[link][label]] and [[link]] links:

$_[0] =~ s/\[\[([^\]]*)\]([\[\]])/_removeNops($1)/geo;

-- PeterThoeny - 20 Apr 2007

Some questions and comments:

  1. I'm having trouble with the $ and \s syntax described above. Perl seems to think $ is part of a variable, and doesn't recognize \s. I have perl 5.8.8 - do others have similar problems or suggestions?
  2. Would it be simpler to use something like this:
    "([[:^word:]])($stopWords)([[:^word:]])"
  3. Did the bracketed link problem described above get solved and uploaded somewhere?
  4. Inspired by ListOfWordsToForceAsLinks, I've almost finished a version that also supports a list of works that must be linked; I'm tentatively calling it ForceStopWikiWordLinkPlugin. Should it be a seperate plugin, or subsume this one?

-- ClifKussmaul - 08 Apr 2008

ForceStopWikiWordLinkPlugin is going well. I've added and verified a set of test cases. The RE is now:

'(^|[[:^alnum:]])' . "($words)" . '(?=s?([[:^alnum:]]|$))'

The bracketed link issue bothers me, though. It seems risky to insert nop s or explicit links, and then try to remove some of them without breaking other code. Would it make more sense to

  1. insert some unique marker,
  2. identify and remove instances of the marker which should not change the page,
  3. identify and replace instances of the marker that should change the page.

-- ClifKussmaul - 13 Apr 2008

I've attached a current version of ForceStopWikiWordLinkPlugin. It handles bracketed links using the approach described above. The documentation topic includes test cases for both force and stop. Comments welcome.

Should I a) leave it attached here, b) rename this plugin, or c) create a new plugin topic?

-- ClifKussmaul - 14 Apr 2008

I'm for renaming the enhanced plugin to something like [ControlWikiWordLink(ing)Plugin.

-- FranzJosefGigler - 14 Apr 2008

I've attached an updated version of ForceStopWikiWordLinkPlugin. I'd appreciate comments if it works or not, and suggestions for enhancing it.

I'd still like to know whether to a) leave it here, b) rename this topic, c) create a new topic.

-- ClifKussmaul - 09 May 2008

We are using this plugin and don't like the behavior of substring matchine. For example if you configure it to not link RedHat but create a topic called RedHatInstallProcedure, both are not linked even through the second should be. Here is a change to two lines that fixes this:

$stopWordsRE = "(^|[\( \n\r\t\|])($stopWords)"; # WikiWord preceeded by space or parens

to

$stopWordsRE = "(^|[\( \n\r\t\|])($stopWords)([\) \n\r\t\|])"; # WikiWord preceeded by space or parens

and

$_[0] =~ s/$stopWordsRE/$1$2/g;

line to

$_[0] =~ s/$stopWordsRE/$1$2$3/g; 

-- RickMach - 09 May 2008

I think the version I just uploaded handles the situation Rick describes, and also behaves correctly when stopwords are in headings or tables of contents. (I have test cases for each of these).

-- ClifKussmaul - 10 May 2008

I had to disable this plugin since I was getting errors viewing certain TWIKI topics from lower privileged users:

Lower privileged users such as TwikiGuest would RCS errors viewing /bin/view/TWiki/TWikiUsersGuide /bin/view/TWiki/AdminToolsCategory /bin/view/TWiki/AdminDocumentationCategory /bin/view/TWiki/DeveloperDocumentationCategory

This is a typical twiki warn log entry when the problem occurred: "| 2009-10-31 - 10:35 | RCS: /inusr/bin/rlog -h %FILENAME|F% of .../TWiki/StopWikiWordLinkPlugin.txt,v failed: at /twikidata/twiki/li b/TWiki/Store/RcsWrap.pm line 276."

-- GregNeugebauer - 2009-11-05

I had another chance to look at this. I deleted the .../TWiki/StopWikiWordLinkPlugin.txt,v file and all is working now. I guess there was an unfriendly RCS character or something in one of the older versions.

-- GregNeugebauer - 2009-11-18

Edit | Attach | Watch | Print version | History: r25 < r24 < r23 < r22 < r21 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r25 - 2009-11-18 - GregNeugebauer
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.