Tags:
create new tag
view all tags

Refactoring Proposal: Repair some of the quirks with TWiki's Anchors

Motivation

There are a couple of Support and Codev topics regarding the issue that TWiki's %TOC% handling fails if headings with identical texts occur in the topic. But this is only one of the more visible ways in which %TOC% is broken - and actually it isn't the processing of %TOC% only, but the general handling of anchors (link targets within a page).

It will take some time to fix it, so why not write down what we've learnt so far?

Description

While I am walking through the code of TWikiRelease04x01, I am describing the mechanism of current TOC handling and where and why it fails. More or less a design development than a feature request. This may come in handy if enhancements to TOC like Bugs:Item3814 will be investigated.

Impact and Available Solutions

Documentation

Since more that five years TWiki is living with a problem: Links created in the table of contents (%TOC% variable, see VarTOC2) fail to work if different headings have the same text. This usually doesn't occur with top level headings, but longer topics occasionally have more sections with the same subheadings in each.

The reason is that the anchors pointing to the headings are currently created from the text of the heading only, same heading texts yield same anchors. This is one of the (very few) situations where TWiki creates invalid XHTML.

When looking at the code, it occurred to me that there are other shortcomings, and indeed there are:

  1. Topic authors can add their own anchors to the text, using #WikiWord syntax. There is no mechanism to prevent duplicate anchors created by topic authors, and there is no mechanism to prevent explicit #WikiWord anchors from overlapping with anchors created for %TOC%.
  2. One can specify an alternate topic for which the table of contents is to be shown: %TOC{"OtherTopic"}%. Most of the time such a table of contents looks identical to what you get if you expand the TOC in the topic itself, but there are exceptions: Headings which are expanded by TWiki variables (plugin variables, but also %INCLUDE{...}% or a bookview search) are only shown if the TOC is in the same topic. This is hideous since missing lines in the table of contents are far less obvious than entries which are linking to the wrong heading.

Examples

Just take the following subsection as an example why a fix would come in handy:

Motivation

You can not jump to this section from the table of contents, can you?

Current Implementation

Currently, creating a table of contents is done in two separate modules:

  • In lib/TWiki.pm the %TOC% variable is expanded, by reading the topic line-by-line and creating links to the headings where appropriate. However, if the topic is not the current one, then the topic text is read "as is", without any variable expansion.
  • Later in the processing of a request, in lib/TWiki/Render.pm, the headings are expanded. They get their <a name="Heading_Text"></a> anchors during that phase. This needs to be done regardless of whether there's a %TOC% in the topic, because the %TOC% could be in any other topic. Sometimes (some "compatibility" thing which I haven't understood yet) two anchors are created for some headings.

Requirement: Anchor creation and link creation must stay in synch

If a table of contents is created for another page, then the link in the TOC and the anchor which serves as a link target are not created in the same HTTP request. To make sure that the TOC entries actually point to existing anchors, the algorithm which creates both attributes must be the same in lib/TWiki.pm and lib/TWiki/Render.pm. The current code ensures this by using the same subroutine to create anchor names and creating them "context-free", passing only the text of the heading.

If that routine were to disambiguate anchor names which come out identical, it would need some sort of "context", a memory of all anchors it has created so far. A simple Perl hash will do.

But wait: If links and anchors are created in separate HTTP requests, both need to create the same memory. This would require that they are seeing the same headings. As we've seen, this is not the case with the TOC for another topic, if this topic creates headings through %INCLUDE{...}%. Additionally, the renderer creates anchor names not only for anchors in the same topic, but also for anchored links elsewhere in [[Web.SomeTopic#Anchor]]. These anchors must be left "as provided" and not disambiguated.

Target Implementation

Proposal 1: Make Anchors Unique

As proposed in FreetownReleaseMeeting2007x03x12: The first anchor for every "autogenerated" content stays as it is today, only duplicates are modified. This will keep 99% of today's TOCs and links to headings unchanged, and those which are changed have been broken anyway: duplicate anchors are useless for linking to, and break XHTML validity.

Proposal 2: Move TOC creation to the renderer

Creation of the table of contents is different from other variable processing:

  • It has to be done "late", after variable expansion, because plugins, includes, and even preference variables could create extra headings.
  • It has to "see" the whole topic text, similar to some plugin variables, but different from everything else expanded by lib/TWiki.pm.

Therefore it has been proposed (as a comment in the code) to move TOC handling to a plugin called by the renderer, which would have the additional benefit that "protected" areas are already taken out of the topic text.

This seems reasonable, though it doesn't need to be implemented as a traditional plugin (with all the restrictions on API): Since TWiki will always be shipped with some code to process %TOC%, if only for all the uses in the TWiki web, it can be just a hardwired call from the renderer, or a configurable handler like the LoginManager.

Proposal 3: Don't try to be smarter than the topic author

I suggest that TWiki should not try to disambiguate anchors explicitly created by topic authors. Explicit anchors are usually created because authors want to write [[Web.SomeTopic#Anchor]] elsewhere. If TWiki would change the anchor automatically then authors would try to adapt to TWiki's anchor names, which is a bad idea if headings come and go.

Proposal 4: Move TOC code into an extra module

Many topics have TOCs, but most haven't. Moving the code to create a TOC into a separate module makes it easier to override the handler, and allows compilation on-demand. I admit that I'm writing this with things like Bugs:Item3814 in mind, which would need a new TOC handler but no changes in the anchor-creating procedure.

-- HaraldJoerg - 13 Apr 2007


Discussion

Proposal 1 and 3 related to spec: Yes I agree.

Proposal 2 I will not comment. Others knows the inside of TWiki much better than I.

Proposal 4: Sounds like a good idea.

-- KennethLavrsen - 13 Apr 2007

  • Proposal 1, Proposal 2: excellent ideas
  • Proposal 2, Proposal 4: Personally I'd prefer to see a new plugin handler, because I can see several "plugin" type applications (e.g. a TOC assembler that works over multiple topics, or one that generates numbered headings, or one that collates to an index database). I think all you would need to do is to create a new plugin handler for the _TOC call. Note that the TOC assembler needs to be able to modify headings as well as the TOC itself (e.g. to add section numbers) so there may be an issue where several plugins all want to generate/modify the TOC (perhaps a TOC API is called for here). Finally, if there are bits missing from the Func API (and there probably are), then I think we should address those holes, rather than creating a new class of plugin.

-- CrawfordCurrie - 14 Apr 2007

From my observation w.r.t. to the patch I attached to Bugs:Item1607 (which follows the first Proposal), Proposal 2 has at least two advantages over Proposal 1:

  • once the rendering logics change, it's less likely to forget to adapt _TOC (and once the subroutine is part of Render.pm, we don't have to duplicate certain steps--in fact, if we can assume that TOC is the last part of a page to be rendered, there's no need to search the page for anything else but HTML Hx tags)
    • Update: as long as _TOC only semi-renders a page (i.e., doesn't resolve variables contained in the headings), especially localised strings (a) look ugly and (b) lead to the generation of wrong links (e.g., MAKETEXT becomes part of the anchor name) frown --mue, 12 Sep 2008
  • currently, it seems that links within a TOC contain unneeded parameters (e.g., "?skin=clean.nat%2calias%2ctagme%2cpattern;sortcol=table;up=#MyHeading_3") which means the page in question gets reloaded the first time you follow a link(?); this clearly should be dealt with in Render.pm
    • Update: the formentioned behaviour has been mentioned in Bugs:Item5987 --mue, 12 Sep 2008

-- MarkusUeberall - 15 Aug 2008

Proposal 4 has one or two additional benefits:

  • disambiguation of 'raw' html anchors (which currently doesn't happen because it would bring additional overhead) could be added/toggled
  • maybe generation of 'compatible' anchornames via makeAnchorName( $text, $compatiblityMode) could be handled likewise (atm, this is only used in conjunction with the rendering of headings (cf. Render.pm); no plugin seems to call this function)

-- MarkusUeberall - 15 Sep 2008

Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r10 - 2008-09-15 - MarkusUeberall
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.