Question
Upgrading to Dakar seems to have changed our wiki's character set from iso-8859-1 to iso-8859-15. This is causing
Windows-1252
encoded dashes and smart quotes (characters x91 - x97) in old topics to be displayed incorrectly (as ? or a square depending on the browser).
The special characters in question appear to be illegal in both iso-8859-1 and iso-8859-15 however with our Cairo install, the browsers were treating iso-8859-1 as Windows-1252. With the change to iso-8859-15 this is no longer happening. I have tried setting the {UseLocale} = 1 and {Site}{Locale} = iso-8859-1 as per
the docs but it does not make any difference - pages are still rendered in iso-8859-15.
Ideally, I want to allow special characters in topics, but not in wiki words.
I can see three possible solutions:
- replace the special characters with their ascii equivalents
- convert to utf-8
- convince Dakar to render in iso-8859-1 or Windows-1252
I'm nervous about converting the whole installation to utf-8 (need to migrate/map the characters in all the old topics, complexity of utf8 in topic names etc.) and I'm not keen on losing the special characters with option 1.
Configure output attached (without the locale setting).
Any help would be appreciated.
Thanks,
Martin
Environment
--
MartinRothbaum - 06 Dec 2006
Answer
If you answer a question - or someone answered one of your questions - please remember to edit the page and set the status to answered. The status selector is below the edit box.
change the charset in both
LocalSite.cfg, configure
--
SteveStark - 06 Dec 2006
Thanks very much Steve - seems obvious, and I probably should have tried it already but I was sticking with the advice in the
InstallationWithI18N topic about (almost) never changing charset. Should this be updated for my situation?
I didn't do anything to set the locale/charset when we installed Cairo and there must be some users out there pasting Windows-1252 characters into wiki topics from MS Word etc.
--
MartinRothbaum - 06 Dec 2006
Go ahead and make the changes in the topic, as the content can always be reversed.
--
SteveStark - 06 Dec 2006
I would but I don't think I'm qualified to recommend this option on the basis of this one experience.
--
MartinRothbaum - 07 Dec 2006
If you are having problems with ISO-8859-15, you should check which locales are available on your machine, using
locale -a and then choose one based on your country/language preferences that supports ISO-8859-1 - e.g.
de_DE.iso-8859-1 if you are in Germany, but make sure you choose one listed on your server by
locale -a.
The locale setting should not be just
ISO-8859-1 but
de_DE.iso-8859-1 - using the former may cause problems.
However.... If someone has Windows-1252 data that they are pasting into the browser text editing field, it's up to the browser to convert this to ISO-8859-1 or -15. It's possible the browser works better with the former since it's been around much longer but there is little difference between the characters.
You could also try using the above locale settings but using the site charset override setting in
configure to set the browser character set to Windows-1252 but that may also have problems. The real solution is to ensure that all data pasted in matches your server's site character set, which is derived from the locale.
--
RichardDonkin - 22 Dec 2006
Now that I've done some recent work on TWiki
I18N, I can see that my answer above wasn't really correct. The problem is that somewhere along the line, TWiki's default ISO-8859-1 character encoding used by TWiki was "half-changed" - the default locale's charset remained ISO-8859-1 while the
{Site}{CharSet} was changed to ISO-8859-15. The change to -15 has broken some other things so I will be reverting this in a future SVN checkin, along with some other fixes (
Bugs:Item3652
has details).
See
InstallationWithI18N for how to install current versions of TWiki with
I18N, taking account of this.
--
RichardDonkin - 31 Aug 2007