Refactoring Proposal: Repair some of the quirks with TWiki's Anchors
Motivation
There are a couple of
Support and Codev topics regarding the issue that TWiki's
%TOC%
handling fails if headings with identical texts occur in the topic. But this is only one of the more visible ways in which
%TOC%
is broken - and actually it isn't the processing of
%TOC%
only, but the general handling of anchors (link targets within a page).
It will take some time to fix it, so why not write down what we've learnt so far?
Description
While I am walking through the code of
TWikiRelease04x01, I am describing the mechanism of current TOC handling and where and why it fails. More or less a design development than a feature request. This may come in handy if enhancements to TOC like
Bugs:Item3814 will be investigated.
Impact and Available Solutions
Documentation
Since more that five years TWiki is living with a problem: Links created in the table of contents (
%TOC%
variable, see
VarTOC2) fail to work if different headings have the same text. This usually doesn't occur with top level headings, but longer topics occasionally have more sections with the same subheadings in each.
The reason is that the anchors pointing to the headings are currently created from the
text of the heading only, same heading texts yield same anchors. This is one of the (very few) situations where TWiki creates invalid
XHTML.
When looking at the code, it occurred to me that there are other shortcomings, and indeed there are:
- Topic authors can add their own anchors to the text, using
#WikiWord
syntax. There is no mechanism to prevent duplicate anchors created by topic authors, and there is no mechanism to prevent explicit #WikiWord
anchors from overlapping with anchors created for %TOC%
.
- One can specify an alternate topic for which the table of contents is to be shown:
%TOC{"OtherTopic"}%
. Most of the time such a table of contents looks identical to what you get if you expand the TOC in the topic itself, but there are exceptions: Headings which are expanded by TWiki variables (plugin variables, but also %INCLUDE{...}%
or a bookview
search) are only shown if the TOC is in the same topic. This is hideous since missing lines in the table of contents are far less obvious than entries which are linking to the wrong heading.
Examples
Just take the following subsection as an example why a fix would come in handy:
Motivation
You can not jump to this section from the table of contents, can you?
Current Implementation
Currently, creating a table of contents is done in two separate modules:
- In
lib/TWiki.pm
the %TOC%
variable is expanded, by reading the topic line-by-line and creating links to the headings where appropriate. However, if the topic is not the current one, then the topic text is read "as is", without any variable expansion.
- Later in the processing of a request, in
lib/TWiki/Render.pm
, the headings are expanded. They get their <a name="Heading_Text"></a>
anchors during that phase. This needs to be done regardless of whether there's a %TOC%
in the topic, because the %TOC%
could be in any other topic. Sometimes (some "compatibility" thing which I haven't understood yet) two anchors are created for some headings.
Requirement: Anchor creation and link creation must stay in synch
If a table of contents is created for
another page, then the link in the TOC and the anchor
which serves as a link target are not created in the same HTTP request. To make sure that the TOC
entries actually point to existing anchors, the algorithm which creates both attributes must
be the same in
lib/TWiki.pm
and
lib/TWiki/Render.pm
. The current code ensures this by
using the same subroutine to create anchor names and creating them "context-free", passing
only the text of the heading.
If that routine were to disambiguate anchor names which come out identical, it would need some
sort of "context", a memory of all anchors it has created so far. A simple Perl hash will do.
But wait: If links and anchors are created in separate HTTP requests, both need to create the
same memory. This would require that they are seeing the same headings. As we've seen,
this is not the case with the TOC for another topic, if this topic creates headings through
%INCLUDE{...}%
. Additionally, the renderer creates anchor names not only for anchors
in the same topic, but also for anchored links elsewhere in
[[Web.SomeTopic#Anchor]]
.
These anchors must be left "as provided" and not disambiguated.
Target Implementation
Proposal 1: Make Anchors Unique
As proposed in
FreetownReleaseMeeting2007x03x12: The first anchor for every
"autogenerated" content stays as it is today, only duplicates are modified.
This will keep 99% of today's TOCs and links to headings unchanged, and those
which are changed have been broken anyway: duplicate anchors are useless
for linking to, and break
XHTML validity.
Proposal 2: Move TOC creation to the renderer
Creation of the table of contents is different from other variable processing:
- It has to be done "late", after variable expansion, because plugins, includes, and even preference variables could create extra headings.
- It has to "see" the whole topic text, similar to some plugin variables, but different from everything else expanded by
lib/TWiki.pm
.
Therefore it has been proposed (as a comment in the code) to move TOC handling to a plugin
called by the renderer, which would have the additional benefit that "protected" areas are
already taken out of the topic text.
This seems reasonable, though it doesn't need to be implemented as a traditional plugin (with
all the restrictions on API): Since TWiki will always be shipped with
some
code to process
%TOC%
, if only for all the uses in the TWiki web, it can be just a
hardwired call from the renderer, or a configurable handler like the LoginManager.
Proposal 3: Don't try to be smarter than the topic author
I suggest that TWiki should
not try to disambiguate anchors explicitly created by topic authors.
Explicit anchors are usually created because authors want to write
[[Web.SomeTopic#Anchor]]
elsewhere. If TWiki would change the anchor automatically then authors would try to adapt to TWiki's
anchor names, which is a bad idea if headings come and go.
Proposal 4: Move TOC code into an extra module
Many topics have TOCs, but most haven't. Moving the code to create a TOC into a separate module
makes it easier to override the handler, and allows compilation on-demand. I admit that I'm writing
this with things like
Bugs:Item3814 in mind, which would need a new TOC handler but no changes in
the anchor-creating procedure.
--
HaraldJoerg - 13 Apr 2007
Discussion
Proposal 1 and 3 related to spec: Yes I agree.
Proposal 2 I will not comment. Others knows the inside of TWiki much better than I.
Proposal 4: Sounds like a good idea.
--
KennethLavrsen - 13 Apr 2007
- Proposal 1, Proposal 2: excellent ideas
- Proposal 2, Proposal 4: Personally I'd prefer to see a new plugin handler, because I can see several "plugin" type applications (e.g. a TOC assembler that works over multiple topics, or one that generates numbered headings, or one that collates to an index database). I think all you would need to do is to create a new plugin handler for the _TOC call. Note that the TOC assembler needs to be able to modify headings as well as the TOC itself (e.g. to add section numbers) so there may be an issue where several plugins all want to generate/modify the TOC (perhaps a TOC API is called for here). Finally, if there are bits missing from the Func API (and there probably are), then I think we should address those holes, rather than creating a new class of plugin.
--
CrawfordCurrie - 14 Apr 2007
From my observation w.r.t. to the patch I attached to
Bugs:Item1607 (which follows the first Proposal), Proposal 2 has at least two advantages over Proposal 1:
- once the rendering logics change, it's less likely to forget to adapt
_TOC
(and once the subroutine is part of Render.pm
, we don't have to duplicate certain steps--in fact, if we can assume that TOC is the last part of a page to be rendered, there's no need to search the page for anything else but HTML Hx
tags)
- Update: as long as
_TOC
only semi-renders a page (i.e., doesn't resolve variables contained in the headings), especially localised strings (a) look ugly and (b) lead to the generation of wrong links (e.g., MAKETEXT
becomes part of the anchor name) --mue, 12 Sep 2008
- currently, it seems that links within a TOC contain unneeded parameters (e.g., "?skin=clean.nat%2calias%2ctagme%2cpattern;sortcol=table;up=#MyHeading_3") which means the page in question gets reloaded the first time you follow a link(?); this clearly should be dealt with in
Render.pm
- Update: the formentioned behaviour has been mentioned in Bugs:Item5987 --mue, 12 Sep 2008
--
MarkusUeberall - 15 Aug 2008
Proposal 4 has one or two additional benefits:
- disambiguation of 'raw' html anchors (which currently doesn't happen because it would bring additional overhead) could be added/toggled
- maybe generation of 'compatible' anchornames via
makeAnchorName( $text, $compatiblityMode)
could be handled likewise (atm, this is only used in conjunction with the rendering of headings (cf. Render.pm
); no plugin seems to call this function)
--
MarkusUeberall - 15 Sep 2008