internationalization1Add my vote for this tag create new tag
, view all tags

Bug: No warning when entering non-standard characters


Test case

When previewing or saving, Wiki fails to warn in case non ISO characters were introduced in the text. As a result, some users using no--ISO-compliant browser (e.g. MS IE) can enter non-standard characters, and view them nicely on their machine while previewing or when accessing later, while other users will see joker characters (typically "?"). Way worse than that, when users with ISO compliant browser edit that same page, they will silently replace all non-ISO characters with joker characters; this results in everybody seeing the joker chars, and makes the job of fixing those characters way more painful (because all the different characters can no longer be searched, they are now all ordinary and undifferentiated "?"). Equally bad is that several MS-dependant users can cooperate and build a page for days without realizing their mistake, and only when a ISO-conforming user shows up they realize their mistake.

  • This seems to be mainly an issue of users with Windows-1252 as the character set in their browser using a site that is set to ISO-8859-1 - the Windows character set has more characters than ISO-8859-1, hence the 'joker characters' that can't be handled by ISO-8859-1 compliant browsers. Most browsers on Windows support both character sets. Solution is to explicitly set the character set in TWiki, which is supported from TWikiRelease01Feb2003. -- RichardDonkin


TWiki version: 20001201 stable
TWiki plugins: none
Server OS: Solaris 5.7 Sparc
Web server: apache-1.3.9
Perl version: 5.6.1
Client OS: windows NT
Web Browser: IE

-- PierreParquier - 09 Aug 2001

Earlier incorrect fix ideas deleted.

Fix record

As Colas suggested, replaced us-ascii by ISO-8859-1 in all templates files. Is in TWikiAlphaRelease and TWiki.org.

-- PeterThoeny - 25 Nov 2001

Have set this back to BugAssigned, since 8859-1 works only for Western Europe - the correct solution is to define a TWiki variable, e.g. %CHARSET%, and set this to ISO-8859-1 by default. This would mean that users of 8859-2, e.g. in Poland, and of other character sets, can use TWiki without editing the template files.

This needs some testing to make sure it works - see TWikiAndNationalCharacters.

-- RichardDonkin - 01 Jul 2002

Put a comment above to warn people off the HTTP_EQUIV variables. Also, see InternationalisationEnhancements, which will mostly fix this issue.

-- RichardDonkin - 07 Dec 2002

Mostly fixed in TWikiAlphaRelease - you can set a site-wide charset in TWiki.cfg, which is used in all templates and HTTP headers.

-- RichardDonkin - 08 Dec 2002

The original posting seems to refer to entering characters in the Windows-1252 charset (a superset of ISO-8859-1) while the site is using ISO-8859-1.

This sort of bug should happen less frequently due to TWiki I18N in TWikiRelease01Feb2003, but it can still happen if a user enters a character that is not in the site character set (e.g. entering ISO-8859-2 when site is using ISO-8859-1) - the browser will typically create a Numeric Character Refence (NCR) that is drawn from the Unicode character set (beyond the first 256 codepoints, which are numbered identically to ISO-8859-1), e.g. ő which generates 'ő'.

In other cases, the character may be entered directly into the page and interpreted in the (wrong) site character set. This is impossible to detect in TWiki.

Full ProposedUTF8SupportForI18N will fix this, but TWiki should probably warn that this is happening, perhaps in the Preview page.

It might also be useful to let people know in the Edit page the character set they are using, but I suspect most users won't know what this means.

-- RichardDonkin - 15 Sep 2003

Note that this topic appears in the To-Do list of AthensRelease as 100% for Spec/Impl/Docs. It could be marked as ProposedFor AthensRelease and MergedToCore if there is no further work required.

-- SamHasler - 15 Feb 2005

TopicClassification BugReport
TopicSummary When previewing or saving, fails to warn in case non ISO characters were introduced in the text
CurrentState UnderInvestigation
OutstandingIssues No consensus on whether this needs to be fixed. CoreTeam please set a priority.

InterestedParties RichardDonkin


Edit | Attach | Watch | Print version | History: r19 < r18 < r17 < r16 < r15 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r19 - 2005-02-15 - SamHasler
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.