Tags:
linking1Add my vote for this tag create new tag
, view all tags
Recognizing topics and auto-linking from text with spaces, dash, and underscore rather than BumpyWords.

-- JeffPeck - 01 Nov 2005: added summary line above.

Is there any better way to do this? Bear with me while I describe "this".

I don't know if anyone has done this before. Probably they have. Moreover I haven't really gone through all the code to understand the TWiki architecture the way I should have but I've hacked the TWiki.pm code on our system anyway so that when we add a new topic, say, "UnsolicitedOffer", it will automatically and dynamically create links from any occurances of "unsolicited offer", "unsolicited-offer", "unsolicited_offer", etc. as well as the normal behavior of creating links from any occurances of TWikiWord syntax: "UnsolicitedOffer".

This is to correct the most annoying feature of WikiWebs. Yes, a lot of people will think this will slow things down a lot or whatever. I really can't speak to their concerns when the Elohim have confounded our language so that we cannot even create a decently-linked glossary without bowing down to the almighty idol of WikiWordSyntax and chanting meaningless gibberish at each other in a futile attempt to communicate.

Here are the lines I added/modded (marked with "JAB") and a bit of their context:

    # TWiki concept regexes
    $wikiWordRegex = qr/[$upperAlpha]+[$lowerAlpha]+[$upperAlpha]+[$mixedAlphaNum]*/;
    $normalTopicRegex = join '|', #JAB
        sort {length($b)<=>length($a)}  #JAB
            map {TWiki::Store::getTopicNames($_)} &TWiki::Store::getAllWebs( "" ); #JAB
    $normalTopicRegex =~ s/([$lowerAlpha])([$upperAlpha])/$1\[\_\ \\\-\]$2/g; #JAB
    $webNameRegex = qr/[$upperAlpha]+[$lowerAlphaNum]*/;
...
                # 'TopicName#anchor' link:
                s/([\s\(])($wikiWordRegex)($anchorRegex)/&internalLink($1,$theWeb,$2,"$TranslationToken$2$3$TranslationToken",$3,1)/geo;

                # 'TopicName' link:
                s/([\s\(])($wikiWordRegex)/&internalLink($1,$theWeb,$2,$2,"",1)/geo;

                # 'topic name' link:  JAB
                s/([\s\(])($normalTopicRegex)/&internalLink($1,$theWeb,$2,$2,"",1)/gieo; #JAB
...
    # Turn spaced-out names into WikiWords - upper case first letter of
    # whole link, and first of each word. TODO: Try to turn this off,
    # avoiding spaces being stripped elsewhere - e.g. $doPreserveSpacedOutWords
    if($theTopic =~ /(^[$lowerAlpha])|[\s_\-]/){ #JAB
        $theTopic =~ tr/[A-Z]/[a-z]/; #JAB
        $theTopic =~ s/^(.)/\U$1/;
        $theTopic =~ s/[\s_\-]($singleMixedAlphaNumRegex)/\U$1/go;      #JAB
    }#JAB

    # Add <nop> before WikiWord inside link text to prevent double links
    $theLinkText =~ s/([\s\(])($singleUpperAlphaRegex)/$1<nop>$2/go;

-- JamesBowery - 07 Feb 2005

Interesting. I think this is a great idea, I would love to have really good phrase linking. It is not something all installations will need or want, so it needs to be done as a plugin rather than as a core patch. A plugin would also be easier to configure for different locales/languages.

Some other questions for you to consider/solve:

  1. If I create a topic named ToBe, do I really want all occurences of the phrase "to be" to be linked to it?
    • Should there perhaps be some rules governing what words may be used to compose links?
  2. Let's say I have the phrase "sausage and chips" in TeaTopic. I then create a topic called "SausageAnd". The above code (if I understand it correctly) will tell me I have SausageAnd chips when I view TeaTopic. If I now create another topic AndChips, it will never be linked, because it is shorter, even though it is equally valid as a phrase link. If I create the topic SausageAndChips, all the old links to SausageAnd will disappear.
    • This behaviour may be OK in some cases, but isn't a general solution
    • really the linked phrase needs to offer all possible alternative targets for each portion of the phrase, maybe as a dropdown
  3. This exact code does a lot of work, and will slow down large installations an awful lot. At the very least you would need to cache the topic lists to accelerate matching.

-- CrawfordCurrie - 10 Feb 2005

I fixed a bug in the "if" statement's regex that made it fail to properly link lower-case references to single word topics.

As for the various conditions underwhich there may be conflicts between intended linkages:

Redundant links added to free-standing text aren't likely to incur any serious costs from a human standpoint so long as explicit links take priority.

I've been using this with our development team and there haven't been any cases where there was a need for explicit links except to external sites. It works quite well. We'll probably leave the code mods there until some plugin is developed.

-- JamesBowery - 12 Feb 2005

To turn this into a plugin is not hard, but requires the fix to the bug reported in TranslationTokenPassedIntoSubroutines.

However, I think that Crawford raises some important questions...

-- ThomasWeigert - 12 Feb 2005

I agree they are important questions for many but not for our case, which I believe to be fairly typical of most TWiki installations: A development team running on a development server hammering out terminology and specifications with a lot of changes going on. In that environment its more important to generate links dynamically since people need links to click through to clarify the technical meanings of new phrases more often, and people have less time to devote to making the pages nice.

To CrawfordCurrie's more specific concerns:

  1. There could be some way to flag certain topics as "explicit link only" so that if you have a fairly generic phrase as a topic name that is used rarely in its technical sense, compared to the generic usage, it will be linked to the topic only when explicitly linked.
  2. I'm not sure you understood what the code does when you said "I have SausageAnd chips when I view TeaTopic". Actually, what you have is [[sausage and]] chips when you view TeaTopic.
    • We obviously disagree about how frequently this feature would be used but if it is a plugin people can not use it as much as they don't want it. I suspect it will end up being incorporated into the kernel once it becomes apparent how much people use it but there's no reason to make an issue out of it now.
    • Offering all possible links is a reasonable option but the default behavior of "greedy linking" (linking to the topic with the longest name) seems to work well for our purposes and it makes sense why it would: it is most informative.
  3. Yes, this code needs optimization for environments with high load. However, as I pointed out the typical TWiki is development where load is far less of an issue. Perhaps a way around the optimization problem would be a generator which would insert [[explicit links]] so you could turn off the implicit linking when you were ready to go to production.

-- JamesBowery - 13 Feb 2005

James, I think this is a great hack! Goes a long way to ameliorate the pain of making bizarre topic names (DnsServer?) but with this I can at least refer to DNS Server in text and it gets linked. Plus it automatically links normally spelled names like First Last to the Main.FirstLast (getting the correct Web without special reminders)!

I modified your code to work with TWiki version 02 Sep 2004 $Rev: 1742 $, and to work with cross-web links and case-sensitive file names [your version worked on my Windows machine, but fails to find -e files on linux.

A simple performance improvement would be to limit the search to the current web, or a set of %AUTOLINKWEBS%

My next hack might be to look for match text against #Anchor references in some special topic (DefinedPhrases?) and create links to those anchors. Combined with the AnchorToolTipSummary patch, these create nice tooltip-powered glossary, without creating a topic for each defined phrase (I know... that is so anti-wiki...)

-- JeffPeck - 27 Oct 2005

Very interesting. Does the said TranslationTokenPassedIntoSubroutines problem still exist on Dakar?

-- MartinCleaver - 28 Oct 2005

I don't know. I just got here smile

The patches I attached are not a plugin; just hacks to Render that search the rendered text for occurances of ( One[-_\s]Topic | Another[-_\s]Topic | ... ) over all topics in all webs (insterting the [-_ ] at any lower-to-upper case transition, and surely that can be generalized). The specialized entry to internalLink then looks for exactly which Web and Topic matches the text.

-- JeffPeck - 28 Oct 2005

I must have missed this topic because of the topic name. I would love to see this as a plugin, if possible.

-- ArthurClemens - 28 Oct 2005

Me too. (OK, its corny)

-- AntonAylward - 28 Oct 2005

What do you reckon then, Plugins.ImplicitLinkPlugin?

-- MartinCleaver - 29 Oct 2005

This is related to the FlexibleWikiWords feature that turns [[some word]] into a some word link pointing to SomeWord. This feature has ramifications:

  • Backlink search is more complicated
  • Topic rename might miss some backlinks

Wikipedia had WikiWord linking in the beginning but changed soon to explicit [...] linking where [some word] points to Some_word, and WikiWords are not linked.

-- PeterThoeny - 02 Nov 2005

Peter, thanks to the pointer to the full discussion.

Meanwhile, I modified the patch (in case anyone else wants to try it out); moved all the code to RenderDotPm and avoid some work when noAutoLink is set, and search in a restricted set of %LINKWEBS% (defaulting to the searchable webs); so each web (or topic) can restrict the range. There is also a place for a hook to restrict the set of topics allowed to be auto-linked.

-- JeffPeck - 02 Nov 2005

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatpatch Render-normalTopicRegex.patch r1 manage 2.0 K 2005-10-27 - 23:20 JeffPeck  
Unknown file formatpatch Render-normalTopicRegex2.patch r1 manage 4.3 K 2005-11-02 - 22:02 JeffPeck Single patch to RenderDotPm
Unknown file formatpatch TWiki-normalTopicRegex.patch r1 manage 0.9 K 2005-10-27 - 23:20 JeffPeck  
Edit | Attach | Watch | Print version | History: r18 < r17 < r16 < r15 < r14 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r18 - 2006-04-17 - WillNorris
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.