r6 - 17 May 2007 - 00:10:55 - PeterThoenyYou are here: TWiki >  Codev Web > ChangeProposal > BugReport > UtfAnchorError
Tags:
internationalization 1 Add my vote for this tag, , create new tag

Bug: Utf-8 encoded anchor brokes page rendering

Anchor for a UTF-8-encoded header can be truncated inside a UTF-8 char. This makes InternetExplorer screw up whole page frown

Test case

Site charser = utf-8, almost any utf-8 encoded header in page text.

Environment

TWiki version: TWikiRelease04Sep2004
TWiki plugins: DefaultPlugin, EmptyPlugin, InterwikiPlugin
Server OS: linux
Web server: apache2
Perl version: 5.8.4
Client OS: win2k
Web Browser: IE5

-- VasilyRedkin - 26 Jul 2006

Impact and Available Solutions

I've developed the following patch. It is not very beauty, but works for me.

--- lib/TWiki/orig/Render.pm    2006-06-25 20:19:11.000000000 +0400
+++ lib/TWiki/Render.pm 2006-07-26 15:14:46.881104037 +0400
@@ -399,7 +399,7 @@
     if ( !$compatibilityMode ) {
         $anchorName =~ s/^[\s\#\_]*//;  # no leading space nor '#', '_'
     }
-    $anchorName =~ s/^(.{32})(.*)$/$1/; # limit to 32 chars - FIXME: Use Unicode chars before truncate
+    $anchorName =~ s/^(.{32,}?)([\x00-\x7F\xC0-\xFF].*)$/$1/; # limit to 32..37 chars, cut on utf-8 char boundary
     if ( !$compatibilityMode ) {
         $anchorName =~ s/[\s\_]*$//;    # no trailing space, nor '_'
     }

-- VasilyRedkin - 26 Jul 2006

Follow up

Thanks Vasily for the report and fix, some people might find this useful. Nevertheless, the TWikiRelease04Sep2004 is no longer actively maintained.

-- PeterThoeny - 29 Jul 2006

This bug also applies to TWiki 4.x, since the code is the same up to 4.0.4 at least.

I've not yet decrypted the regex to determine that it's correct and it's likely not to work when we turn on Unicode character mode or with other 8-bit character sets (e.g. those that use almost entirely 8-bit-high characters such as KOI-8). Presumably any European 2-byte UTF-8 character would be enough as a test case.

This code should not go in as it is, since it will break with non-UTF-8 character sets. However, it may be useful for people using UTF-8 as their site character set.

This is somewhat like other TOC issues listed at InternationalisationIssues, which should really be resolved at the same time.

-- RichardDonkin - 31 Jul 2006

I filed Bugs:Item2711 for TWiki 4.

-- PeterThoeny - 01 Aug 2006

This Bug not fixed in TWiki 4.1.1 !!!

-- AndreyTkachenko - 11 Feb 2007

Tracked now in Bugs:Item4074.

-- PeterThoeny - 17 May 2007

 

Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r6 < r5 < r4 < r3 < r2 | More topic actions
 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback SourceForge.net Logo