First version posted. Enjoy.
--
PeterThoeny - 09 Aug 2006
Hm, maybe this should be enhanced by allowing wildcards or even regular expressions, but I'm also concerned about possible performance impacts. Doesn't this extra check slow down things?
--
FranzJosefSilli - 09 Aug 2006
Do you have a use case for wildcards or regex? I assumed that only some fixed words (such as product names and Scottish names) need to be escaped.
Allowing wildcards or even regex is not so much a performance question, more a security question. The Plugin currently filters out anything not
A-Z,
a-z,
0-9.
--
PeterThoeny - 09 Aug 2006
Thanks for responding to my request for the so promptly, Peter! I have noticed one small thing: if the wiki word is the first word in a paragragh, it still becomes a link.
--
LynnwoodBrown - 10 Aug 2006
OK, fixed, and new version posted.
--
PeterThoeny - 10 Aug 2006
I have another instance where one of my stop words becomes a link.
The word is "UniWorks" and it works everywhere except in the following table text:
| *In Field* | *UniWorks will check that...* |
For some reason the "UniWorks" is still a link in this case.
Any ideas?
--
DuncanKinnear - 01 Mar 2007
Ah, it looks like an interference between table heading and bold. As a workaround, write:
| *<nop>UniWorks will check that...* |
(the
<nop> gets removed at the end of the rendering process.)
--
PeterThoeny - 01 Mar 2007
Peter, I had a look at the Plugin code and tried changing the
stopWordsRE to include the '*' and '_', thus:
$stopWordsRE = "(^|[\( \n\r\t\|\*_])($stopWords)";
and the link went away.
Is there anything wrong with doing this that you can see?
--
DuncanKinnear - 01 Mar 2007
No, this should work just fine.
--
PeterThoeny - 01 Mar 2007
Thanks Peter.
Now, I did notice when I was looking at the Plugin code that the RE is actually searching for
WikiWords that
start with the Stop Words defined.
This means that if you define "RedHat" as a Stop Word, then "RedHatInstallationGuide" would not be treated as a link either.
This was not what I expected from reading the Plugin documentation, and I think it is important to add this to the documentation.
Further, I'd like to extend the RE so that it matches the stop words exactly. I was thinking I could just extend the RE to:
$stopWordsRE = "(^|[\( \n\r\t\|\*_])($stopWords)([\) \n\r\t\|\*_]|$)";
and add a
$3 after the
$2 in the substitution later.
Would this work? Any pitfalls?
Also, could the RE be shortened by using "\s" instead of the " \n\r\t"? This would make the RE:
$stopWordsRE = "([\(\s\|\*_])($stopWords)([\)\s\|\*_])";
--
DuncanKinnear - 01 Mar 2007
Hmmm. I just tested a simpler version of the RE and the "\s" does not match the start and end of line ("^" and "$") as I had previously thought, so you would still have to retain those.
So, my final version of the RE is:
$stopWordsRE = "(^|[\s(|*_])($stopWords)($|[\s)|*_])"
(you shouldn't need to escape the brackets, pipes, asterisks or underscores as long as they are in square brackets)
What do you think?
--
DuncanKinnear - 01 Mar 2007
Test for end of WikiWord is a sensible enhancement. I would escape the special chars within square brackets for readability and to avoid issues with bugs in older Perl versions (if any).
--
PeterThoeny - 01 Mar 2007
Fair enough on the special characters. It is important to maintain that backward-compatibility.
At least you can save yourself a bit of space by using the '\s'. Or is that a fairly recent regexp addition? I know I only found out about it 3 or 4 years ago.
Aren't REs the most ugly (and yet most powerful) part of Perl?
--
DuncanKinnear - 01 Mar 2007
I think '\s' is safe in Perl (unlike in external grep, aka TWiki topic text search).
--
PeterThoeny - 02 Mar 2007
Another issue with this Plugin is that it adds the <nop> into bracketed links.
For example, if you had 'RedHat' as a Stop Word and you created the following link:
[[Installing RedHat from CD]]
A nop is inserted before the
RedHat? . This can have one of two side-effects:
- If the target topic does not exist, then the topic that is created when the user clicks on the '?' is: InstallingNopRedHatFromCD?
- If the target topic exists, the link will not point to it because it is looking for a topic called: InstallingNopRedHatFromCD?
So, to fix this, I added the following lines after the substitution line in the
preRenderingHandler routine:
LOOP: {
$_[0] =~ s/(\[\[[^]]*?)<nop>/$1/g;
redo LOOP if ( $_[0] =~ /\[\[[^]]*?<nop>/ ) ;
}
I had to use the LOOP construct to allow for multiple
StopWords? in one bracketed link.
Any pitfalls that anyone can spot with this?
--
DuncanKinnear - 19 Apr 2007
The current regex does not terminate at the end of a link, e.g. it will remove nops after the closing brackets. Also, you should only remove the
nop if part of the link, e.g. exclude the label part of links (
[[link][label]]. A safer approach is to call a subroutine for the link part of links, and do a global regex in that subroutine. Untested regex, checking link part of
[[link][label]] and
[[link]] links:
$_[0] =~ s/\[\[([^\]]*)\]([\[\]])/_removeNops($1)/geo;
--
PeterThoeny - 20 Apr 2007
Some questions and comments:
- I'm having trouble with the $ and \s syntax described above. Perl seems to think $ is part of a variable, and doesn't recognize \s. I have perl 5.8.8 - do others have similar problems or suggestions?
- Would it be simpler to use something like this:
"([[:^word:]])($stopWords)([[:^word:]])"
- Did the bracketed link problem described above get solved and uploaded somewhere?
- Inspired by ListOfWordsToForceAsLinks, I've almost finished a version that also supports a list of works that must be linked; I'm tentatively calling it ForceStopWikiWordLinkPlugin? . Should it be a seperate plugin, or subsume this one?
--
ClifKussmaul - 08 Apr 2008
ForceStopWikiWordLinkPlugin? is going well. I've added and verified a set of test cases.
The RE is now:
'(^|[[:^alnum:]])' . "($words)" . '(?=s?([[:^alnum:]]|$))'
The bracketed link issue bothers me, though. It seems risky to insert
nop s or explicit links,
and then try to remove some of them without breaking other code.
Would it make more sense to
- insert some unique marker,
- identify and remove instances of the marker which should not change the page,
- identify and replace instances of the marker that should change the page.
--
ClifKussmaul - 13 Apr 2008
I've attached a current version of
ForceStopWikiWordLinkPlugin? .
It handles bracketed links using the approach described above.
The documentation topic includes test cases for both force and stop.
Comments welcome.
Should I a) leave it attached here, b) rename this plugin, or c) create a new plugin topic?
--
ClifKussmaul - 14 Apr 2008
I'm for renaming the enhanced plugin to something like
[ControlWikiWordLink(ing)Plugin? .
--
FranzJosefGigler - 14 Apr 2008
I've attached an updated version of
ForceStopWikiWordLinkPlugin? .
I'd appreciate comments if it works or not, and suggestions for enhancing it.
I'd still like to know whether to a) leave it here, b) rename this topic, c) create a new topic.
--
ClifKussmaul - 09 May 2008
We are using this plugin and don't like the behavior of substring matchine. For example if you configure it to not link
RedHat? but create a topic called
RedHatInstallProcedure? , both are not linked even through the second should be. Here is a change to two lines that fixes this:
$stopWordsRE = "(^|[\( \n\r\t\|])($stopWords)"; # WikiWord preceeded by space or parens
to
$stopWordsRE = "(^|[\( \n\r\t\|])($stopWords)([\) \n\r\t\|])"; # WikiWord preceeded by space or parens
and
$_[0] =~ s/$stopWordsRE/$1$2/g;
line to
$_[0] =~ s/$stopWordsRE/$1$2$3/g;
--
RickMach - 09 May 2008
I think the version I just uploaded handles the situation Rick describes, and also behaves correctly when stopwords are in headings or tables of contents. (I have test cases for each of these).
--
ClifKussmaul - 10 May 2008