Tags:
create new tag
view all tags

Question

Upgrading to Dakar seems to have changed our wiki's character set from iso-8859-1 to iso-8859-15. This is causing Windows-1252 encoded dashes and smart quotes (characters x91 - x97) in old topics to be displayed incorrectly (as ? or a square depending on the browser).

The special characters in question appear to be illegal in both iso-8859-1 and iso-8859-15 however with our Cairo install, the browsers were treating iso-8859-1 as Windows-1252. With the change to iso-8859-15 this is no longer happening. I have tried setting the {UseLocale} = 1 and {Site}{Locale} = iso-8859-1 as per the docs but it does not make any difference - pages are still rendered in iso-8859-15.

Ideally, I want to allow special characters in topics, but not in wiki words.

I can see three possible solutions:

  1. replace the special characters with their ascii equivalents
  2. convert to utf-8
  3. convince Dakar to render in iso-8859-1 or Windows-1252

I'm nervous about converting the whole installation to utf-8 (need to migrate/map the characters in all the old topics, complexity of utf8 in topic names etc.) and I'm not keen on losing the special characters with option 1.

Configure output attached (without the locale setting).

Any help would be appreciated.

Thanks, Martin

Environment

TWiki version: TWikiRelease04x00x05
TWiki plugins: SpreadSheetPlugin BeautifierPlugin CalendarPlugin CommentPlugin ConditionalPlugin DatabasePlugin EditTablePlugin InterwikiPlugin JiraLinkPlugin LdapPlugin LinkOptionsPlugin MacrosPlugin PreferencesPlugin RedirectPlugin RenderListPlugin SlideShowPlugin SmiliesPlugin TablePluginTagMePlugin WysiwygPlugin
Server OS: SuSE Enterprise (SLES) 9.3, kernel 2.6.5
Web server: Apache 2.0.49
Perl version: 5.8.3
Client OS: MS Windows XP SP2
Web Browser: Firefox 1.5, 2.0, IE6, IE7
Categories: Internationalisation

-- MartinRothbaum - 06 Dec 2006

Answer

ALERT! If you answer a question - or someone answered one of your questions - please remember to edit the page and set the status to answered. The status selector is below the edit box.

change the charset in both LocalSite.cfg, configure

-- SteveStark - 06 Dec 2006

Thanks very much Steve - seems obvious, and I probably should have tried it already but I was sticking with the advice in the InstallationWithI18N topic about (almost) never changing charset. Should this be updated for my situation?

I didn't do anything to set the locale/charset when we installed Cairo and there must be some users out there pasting Windows-1252 characters into wiki topics from MS Word etc.

-- MartinRothbaum - 06 Dec 2006

Go ahead and make the changes in the topic, as the content can always be reversed.

-- SteveStark - 06 Dec 2006

I would but I don't think I'm qualified to recommend this option on the basis of this one experience.

-- MartinRothbaum - 07 Dec 2006

If you are having problems with ISO-8859-15, you should check which locales are available on your machine, using locale -a and then choose one based on your country/language preferences that supports ISO-8859-1 - e.g. de_DE.iso-8859-1 if you are in Germany, but make sure you choose one listed on your server by locale -a.

The locale setting should not be just ISO-8859-1 but de_DE.iso-8859-1 - using the former may cause problems.

However.... If someone has Windows-1252 data that they are pasting into the browser text editing field, it's up to the browser to convert this to ISO-8859-1 or -15. It's possible the browser works better with the former since it's been around much longer but there is little difference between the characters.

You could also try using the above locale settings but using the site charset override setting in configure to set the browser character set to Windows-1252 but that may also have problems. The real solution is to ensure that all data pasted in matches your server's site character set, which is derived from the locale.

-- RichardDonkin - 22 Dec 2006

Now that I've done some recent work on TWiki I18N, I can see that my answer above wasn't really correct. The problem is that somewhere along the line, TWiki's default ISO-8859-1 character encoding used by TWiki was "half-changed" - the default locale's charset remained ISO-8859-1 while the {Site}{CharSet} was changed to ISO-8859-15. The change to -15 has broken some other things so I will be reverting this in a future SVN checkin, along with some other fixes (Bugs:Item3652 has details).

See InstallationWithI18N for how to install current versions of TWiki with I18N, taking account of this.

-- RichardDonkin - 31 Aug 2007

Change status to:
Topic attachments
I Attachment History Action Size Date Who Comment
HTMLhtm configure.htm r1 manage 149.3 K 2006-12-06 - 08:51 UnknownUser Configure output
Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r10 - 2007-08-31 - RichardDonkin
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.