create new tag
, view all tags
Currently Twiki has no way of FindingUnwrittenButLinkedPages - as per the Q ListingAllUndefinedButUsedWikiWords asked by MartinCleaver .

As a first pass I've written a simple script that goes away and collates the following info:

  • WikiWord, Use count, Existing or not, List of existing pages that reference the WikiWord

This is currently done on the fly, but it might be nice to do things in a more "make" style of method, and to re-use the stored info if it hasn't changed since the table was last generated.

The existing script I've knocked up looks like this, but probably needs redoing to be really useful:

#!/usr/bin/env perl
$webDataLocation = "/usr/local/httpd/twiki/webs/Projects/data";
opendir(WEBDIR, $webDataLocation);
while($file=readdir(WEBDIR)) {
   next unless ($file =~ /\.txt$/);
   open(IN, "$webDataLocation/$file");
   $slurp= join (" ", <IN>);
   $slurp =~ s/[^a-zA-Z0-9 ]/ /g;
   $slurp =~ s/\s+/ /g;
   foreach $word (split(/\s+/, $slurp)) {
      if ($word =~ /^[A-Z]+[^A-Z]+[A-Z]+[^A-Z]+$/) {
   close IN;
foreach $word (keys %seen) {
    if ( -e "$webDataLocation/$word.txt") {
        push (@exists, "$seen{$word} : $word ref'd by : " . (join(" ", sort keys %{$seenIn{$word}})) . "\n");
    } else {
        push (@notexists , "$seen{$word} : $word ref'd by : " . (join(" ", sort keys %{$seenIn{$word}})) . "\n");
$EXISTS = join ("", sort { $b <=> $a } @exists);
$NOTEXISTS = join ("", sort { $b <=> $a } @notexists);
print <<REPORT;
Twiki Topics Referenced that have Topics defined
Twiki Topics Referenced that need Writing

It's probably of some use as is, but needs alot of tarting up and persistant memoisation to be properly useful I think.

-- MichaelSparks - 20 Jul 2001

Great. What say we make this a regular report as per WebStatistics?

-- MartinCleaver - 22 Jul 2001

I just noticed this. I think it's also useful to find weakly linked pages, things near the bottom of the existing list are those that might be hard for people to find using a normal "stumbling" pattern.

-- MikeMaurer - 14 Aug 2003

Some tasks crop up again and again: A clean way to FindAllLinksInPage is necessary for

The problem is, that there are a many ways to create links. (Hmm... this is a wiki -- it's a feature, not a problem!) To definitely, positively, absolutely cover all cases, you have to apply all rules which TWiki does, including all installed plugins.

This is not feasible. This must be delegated to the TWiki renderer itself. Ways to this:

  1. let TWiki render the page completely into a naked template; then de-render all links: sounds ugly and costly in terms of run-time
  2. provide hooks for all functions emitting links; then you could register to collect all links as you go by: probably to many places to change

The make-approach could make the run-time cost bearable. Maybe combined with an on-save hook and fields in ADatabaseCalledTWiki? If something comes out of this, I'll be glad to get the missing links into the TouchGraphAddOn

-- PeterKlausner - 14 Aug 2003

A 3rd solution: make a hook called when a unresolved link is found:

  • It should not slow TWiki on normal pages
  • then you can have a plugin storing all the missing links, triggered by a web-crawling of the site
This could also be helpful to implement "catch-alls" to catch redirected pages or webs.

-- ColasNahaboo - 14 Aug 2003

Nice work. smile But it generates too much hits on the "non existing"-side since i.e. author names in meta data are normally not fully qualified ("Main."). Ignoring meta data could work...

-- OliverKrueger - 30 Dec 2004

Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r9 - 2004-12-30 - OliverKrueger
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.