Tags:
create new tag
, view all tags

Dangling Links Tool Needed

I would like to have a tool - probably a plugin - that could find all of the dangling links in a twiki web. E.g. all of the WikiWords that point to pages that have not yet been created in the current web. Or a set of twiki pages, or in multiple webs. TWiki itself does this naturally and fairly conveniently on a single page, although even here I might like to have a nice summary.

Well-managed web sites have tools that do this for all links - HTML links, etc. However, it would be nice to have a twiki specific version. The generic web link checker returns a full URL, which can be a hell of a lot harder to interpret than the simple WikiWord

REASON: I am trying to write a document, an on-line manual or specification, that may eventually get published - either on the web, possibly on paper. In the subset of the document that will get published there should be no dangling links. I need to hunt down such links, and correct them.

I find that many of these dangling errors are typos - e.g. I type HeresALink rather than HereIsALink. Or, similarly

[[Here's a link]]
rather than
[[Here is a link]]

I envisage such a tool creating a list of links that both exist and do not exist, to allow comparison. For the links that do not exist, a search command might be created that jumped to each location, and allowed edit. Or, global search and replace, perhaps with check boxes. (I think there's a global search and replace plugin.) Such a search for the links that do exist would be good, too.

Fixing such dangling links is one of the things that I use EMACS on the raw TWiki files for, as opposed to the web browser interface.


I am hopeful that such a plugin already exists, although I have spent several hours searching to no avail.


By the way, if such a plugin already exists, please, Please, PLEASE email me: mailto:andy.glew.delete.this.primitive.spam.guard@intelPLEASENOSPAM.com. I probably will not read this TWiki again as quickly as I want to find this plugin.

  • EMAIL SENT -- Sven

-- AndyGlew - 16 Mar 2006

See WantedTopics, FindReferencedButNotDefinedWikiWords, ListingAllUndefinedButUsedWikiWords, HowToFindOrphanedTopics

-- PeterThoeny - 16 Mar 2006

Yet Another Script to Find Dangling Links

AndyGlew - 22 Mar 2006:

I started out with ListingAllUndefinedButUsedWikiWords, fixed some of its most annoying bugs, and then reworked it completely. I will post the updates to ListingAllUndefinedButUsedWikiWords there; I will post my new, improved, IMHO right direction but raises issues tool here.

I am not going to package it nicely at this time; I have people yelling at me already.

The old script had problems in that it did not understand all of the TWiki link syntax. It did not understand

[[ double bracket links ]]
. It did not understand Web.CrossWebTopic or Web.SubWeb.NestedLinks. This produced a lot of bogus data when I ran it on my wiki webs, where pages have frequently moved between webs.

I believe that the underlying problem is the code small of replication, violating the principle of OAOO (Once and Only Once): TWiki itself has to understand its link syntax. It smells bad to try to create a separate module to understand the link syntax. If TWiki extended the link syntax (again), the separate tool would have to be updated.

Using TWiki::Render Standalone

I therefore tried to use TWiki itself to recognize link syntax. I did this by trying to use the TWiki::Render package in my tool. Unfortunately, TWiki's packages are highly interdependent - TWiki/Render.pm needs TWiki.pm, and TWiki.pm needs just about all of the rest of the TWiki packages. These modules are not really modular - they are hard to use in isolation, or even to test in isolation.

But... Perl allows us to play some tricks that may seem dirty, but which get the job done. Using Michael Feather's terminology, Perl allows us to create "seams" for testing and reuse. This is done by My_TWiki_Renderer.pm, from which I extract the key part below.

Basically, I create a class My_TWiki_Renderer which inherits from TWiki::Render. But I have this class interact with a Runt_TWiki_Session rather than a full TWiki.pm session; and Runt_TWiki_Session itself stubs out some of the annoying cross module dependencies.

My_TWiki_Renderer::render_text_extracting_link calls TWiki::Renderer::getRenderedVersion. And therefore it uses almost all of the machinery of TWiki::Render... except that it intercepts _renderWikiWord. Instead of calling TWiki::Render::_renderWikiWord it calls My_TWiki_Renderer::_renderWikiWord, which records the links in a list that I process later.

I needed to make one small change to TWiki::Render: I needed to have it invoke _renderWikiWord via ->method syntax, $this->_renderWikiWord(...) rather than free function syntax _renderWikiWord($this,...). This allowed the substitute My_TWiki_Renderer::_renderWikiWord to be called.

{
    package My_TWiki_Renderer;
    use TWiki::Render;
    @My_TWiki_Renderer::ISA = "TWiki::Render";
    
    sub new {
   my $class = shift;
   my $this = {};
   bless $this, $class;
   $this->{session} = new Runt_TWiki_Session;
   $this->{wiki_link_list} = new Wiki_Link_List;
   return $this;
    }

    sub render_text_extracting_links {
   my $this = shift;
   my ($text,$web,$topic) = @_;
   $this->{current_web} = $web;
   $this->{current_topic} = $topic;
   # Call TWiki::Renderer::getRenderedVersion
   $this->getRenderedVersion($text,$web,$topic);
   delete $this->{current_topic};
   delete $this->{current_web};
    }


    sub _renderWikiWord {
   my ($this, $theWeb, $theTopic, $theLinkText, $theAnchor, $doLinkToMissingPages, $doKeepWeb) = @_;
   #print "calling _renderWikiWord(-,'$theWeb','$theTopic','$theLinkText','$theAnchor',...)\n";
   # filter out links known to be bogus
   # TBD - why is TWiki reporting these to me?
   return if( $theTopic =~ m/^[A-Z0-9]*$/ );
   # Track link
        $this->{wiki_link_list}->add(web=>$theWeb,topic=>$theTopic,anchor=>$theAnchor,
                 in_current_web=>($this->{current_web} eq $theWeb),
                 );
    }


    1;
}

How Bad Is This?

Is this good or bad?

A: both.

I maintain that it is smelly to violate OAOO, and to maintain lots of different versions of the code that recognizes wiki link syntax, inconsistenly.

But the way I accomplished this above is a hack. It's a hack because, if somebody adds still more interdependencies between TWiki::Render and other modules, the way I wrote the code above will break ---- I might have to add more stubs for more TWiki classes.

I.e. it is a hack because something like the above is really how the code should be designed. The above might be a step into cleanly refactoring the TWiki packages - they should not be so interdependent. But, an outside should not make such blatant assumptions about interdependence.

Furthermore, it's a hack because I really don't know why it is working. I don't know what is happening to all of the other stuff that TWiki::Render does. I expected to have to provide stubbed output routines - but I don't seem to have needed to. I.e. it's a hack because I created a "seam" along an undocumented internal interface.

Sorry about that.

Neverthless, I hope that something like this seam can be more officially supported my TWiki. It is good to avoid replication.

Features and Deficits

smile The new script seems to handle all of the twiki link syntax I have thrown at it.

frown Unfortunately, for some unknown reason it also seems to recognize ALLUPPERCASEWORDS as twki links. I have added a kluge to filter them out, but some other extraneous stuff still sneaks through. (I have no idea why _renderWikiWord would be called on non-wiki-words.)

smile The new script handles cross-web linking and Web.Subweb.Path syntax.

frown I hardwire knowledge of TWiki's directory structure into the script, searching for .*/data/.* to figure out cross-web links. Although this works fine for TWiki sites, I have a dream of using TWiki syntax for document files scattered all throughout my directory tree, e.g. in C, C++, Perl, and RTL source code.

smile The new script recognizes dangling links; it also finds orphan files (no links to a file), and, for good measure, prints out a list of properly set up files.

smile The new script has arguments -directory, -recurse, and -file. frown The new script needs more command line arguments for particular things you may want to filter in or out - e.g. disable printing the list of orphan files, etc.

smile The new script is a UNIX (or Cygwin) command line interface program. I run it from cron.

frown The new script should be installable in the TWiki website. But I am not allowed to do that at Intel, so I can't test it. frown The new script should also be configurable to be a Plugin. (I'm allowed to do that - go figure - but haven't gotten around to it yet.)

frown The new script does not recognize %INCLUDE{SomeLink}% nor the EmbedTopicPlugin (which I depend on) %BeginTopic{SomeLink}%, nor an extensionthat I hope to see %META:TOPICCHILD{name="SomeLink"}% (I hope to use META:TOPICCHILD so that I can create proper printouts of an entire TWiki web or website.)

TWiki::Render::_renderWikiWord's behaviour for these is surprising. It doesn't appear that I will be able to piggyback on TWiki::Render to recognize these. I am therefore thinking, once again, about building an outside of TWiki link parser.

Conclusion

The new script does its job nicely, but will be fragile if TWiki does not make "official" the behavior the new script depends on.

I have spent all the time I have to spend on this, this week. I won't be able to work on it again for a while.

-- AndyGlew - 22 Mar 2006

End Matter

Much to my surprise, I have learned that ALLUPPERCASEWORDS are TWiki links - in my recently updated twiki installtion, and on twiki.org.

I haven't found this documented anywhere on twiki - but this is likely the usual search limitations - so I am documenting what I have learned myself in TWikiLinkTypes

-- AndyGlew - 31 Mar 2006

There is another way to do this.But a litle slow:)

-- PavelKotrc - 24 May 2006

THISISALINK?

-- MeredithLesly - 24 May 2006

Nope.

-- MeredithLesly - 24 May 2006

Abbreviations are a little special as they will go active if a matching page is found, but a suggestion to create the page is not shown if a topic is not found (as with normal wikiwords). See ExportMoreRegexes (abbrevRegex) / WhatAreABBREVLinks.

-- SteffenPoulsen - 24 May 2006

See also PluginLinkHandler

-- CrawfordCurrie - 29 May 2006

See also Plugins.TopicReferencePlugin. Way back in April 2006, JeffCrawford wrote

I plan to extend it a bit more to list references to non existing topics
 (dangling references) in future when I have some more time.

I've been moving pages between two webs. I don't need an orphan checker nearly as much as I need a dangling reference checker. Has anyone worked this up?

Lots of scripts posted here, but I don't have server access and am unlikely to get it!

p.s. both dangling links and orphan checking should be built into TWiki!

-- VickiBrown - 09 Jul 2007

Topic attachments
I Attachment History Action Size Date Who Comment
Perl source code filepm My_TWiki_Renderer.pm r1 manage 3.6 K 2006-03-22 - 22:36 AndyGlew My_TWiki_Renderer - a hack to make TWiki::Render usable by a standalone application
Texttxt findnontopics.pl.txt r1 manage 2.7 K 2006-05-24 - 10:35 PavelKotrc  
Texttxt findunwrittentwikipages-old.pl.txt r1 manage 5.3 K 2006-03-22 - 22:34 AndyGlew original script, derived from ListingAllUndefinedButUsedWikiWords
Texttxt findunwrittentwikipages.pl.txt r1 manage 6.3 K 2006-03-22 - 22:35 AndyGlew New version of tool, using TWiki::Render to recognize the link syntax
Texttxt test-my-twiki-renderer.pl.txt r1 manage 3.8 K 2006-03-22 - 22:37 AndyGlew a test for My_TWiki_Renderer.pm
Edit | Attach | Watch | Print version | History: r12 < r11 < r10 < r9 < r8 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r12 - 2007-07-09 - VickiBrown
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.