(Copied from ProposedUTF8SupportForI18N)
I would like to make a small remark. I installed last stable release in Win 2000 using Cygwin, and all was OK except Polish characters were displayed in the Edit inline box in Unicode, like ą I didn't find any solution in Support and Dev webs, and tried various options. Finally I replaced $ Site Charset ISO-etc. with UTF-8 in twiki\lib\Twiki.pm and now all is OK.
Please consider this small exchange.
--
AndrzejGoralczyk - 14 Feb 2004
Did you try setting the
$siteLocale to something like
pl_PL.ISO-8859-2? That's the correct character set for use of Eastern European characters including those used in Polish. UTF-8 doesn't really work that well at present, although work is going on as part of the Phase 2 UTF-8 plan - so you might find ISO-8859-2 works better right now. Conversion of non-ISO-8859-1 characters to numeric character references such as ą is what happens if you paste or type ISO-8859-2 characters when your browser page is ISO-8859-1 (the default for TWiki).
Even when Unicode is supported, using ISO-8859-* character sets will tend to give better sorting of Polish
WikiWords in
WebIndex until Phase 3 of the Unicode support is done (language-sensitive collation is a tough problem that is handled automatically by locale setups - Unicode doesn't work well with locales, so something else will need to be done.
Please post a Support question with your details, including the
HTML output of latest testenv (
CVSget:bin/testenv
), and I should be able to resolve this. See
SupportGuidelines for details.
I have now enhanced the installation guide to cover internationalisation setup, which should clarify this sort of issue. See the
TWiki installation guide, step 4.
Also, note that
CygWin has broken locales (testenv warns about this), so you'll need to set
$localeRegexes to 0 and specify the required upper and lower case national characters - the installation guide talks about how to do this and there are comments in
TWiki.cfg as well.
Let me know how you get on with this!
--
RichardDonkin - 16 Feb 2004
Thank You, Richard.
In fact, I tried various patterns for locale setup, and some other directions from perl documentation, and finally came to conclusion that locale istn't supported, the command locale -a doesn't work, and it is impossible to go out from C state of site locale. Consequently, came to TWiki.pm and set siteCharset, which is to use when locale isn't supported. I also set charset in
main template correction: in TwikiPreferences accordingly, to UTF. This way I eliminated most painful garbage from edit window, albeit it is not full support of Polish language.
Of course, I can try ISO-8859-2, and also
$localeRegexes and will let You know about effects.
--
AndrzejGoralczyk - 16 Feb 2004
I realised half-way through my comment that Cygwin is the main reason that
I18N is not working, but the rest applies to anyone using TWiki for Polish on Linux/Unix - locales just don't work on Cygwin Perl (or
ActiveState Perl). However, you can use ISO-8859-2 to ensure characters aren't changed into
&261; type codes, and also
$localeRegexes = 0 to make TWiki use the contents of the
$upperNational and
$lowerNational parameters. I tested this while developing the
I18N code on Cygwin, so it should work OK, and will enable
WikiWords with Polish accented characters.
From my Debian Linux box, with a working locale of
pl_PL.ISO-8859-2, I've attached what Perl 5.6 thinks are valid alphanumeric characters - you should be able to set the
$upperNational and
$lowerNational parameters from this (view with ISO-8859-2 turned on in your browser/editor).
--
RichardDonkin - 16 Feb 2004
semi-related followup in
BulgarianLanguageSetup
--
WillNorris - 16 Feb 2004
I have tested above. Changed
$SiteCharset to ISO-8859-2 in TWiki.pm, and
localeRegex to 0 in TWiki.cfg, and typed in
$upperNationals and
$lowerNationals. I also set ISO-8859-2 in TWikiPreferences (see correction above marked in red).
Results: Polish characters ("ogonki") are displayed well in the
HTML page and in edit box. However, Polish characters are not supported in WikiNames. Also single quotation character is displayed incorrectly, as ' (U+001A, ESCAPE) in some RSS feeds (for example Scotsman
UK News
. To this end TWiki is working exactly as it worked with UTF-8.
The only difference I noticed is about British pound character £. In ISO-8859-2 it is displayed in
HTML as Polish Ł, which is acceptable. In UTF-8, it is displayed as 
 in Scotsman's feed (RSS v. 2.0), and as well known question mark in black square in
BBC feeds
(RSS v. 0.91).
--
AndrzejGoralczyk - 21 Feb 2004
Hi - I'm surprised that
WikiWords /
WikiNames are not working. Do you have
$useLocale set to 0? If so, set it to 1.
Can you attach a test page (the .txt file), as well as your TWiki.cfg and
testenv HTML output?
WikiWords should definitely work in ISO-8859-2, even without a working locale - that's why I put in the $upperNational etc features.
As for the British pound character, this is not a character in ISO-8859-2, only in ISO-8859-1 - if you want to display this character you'll need to use
£ (Unicode U+00A3) to display this. If there are many non-ISO-8859-2 characters that you need, you might want to try Unicode as long as you don't need all of TWiki's features to work (e.g. searching is broken at present) and are OK with updating to new alpha releases over the next few months.
One note - you should not need to set the character set in
TWiki.pm,
TWikiPreferences or the templates - leave the templates using
%CHARSET%, so that the TWiki
I18N code can set this based on the
$siteLocale in
TWiki.cfg. Even if you don't have a working locale you should still set the
$siteLocale (and
$useLocale = 1) for this reason. A sample config that should work in TWiki.cfg is:
$useLocale = 1;
$siteLocale = "pl_PL.ISO-8859-2";
$localeRegexes = 0;
Leave the
$upperNational and
$lowerNational parameters set to the Polish accented characters. This should then 'just work'!
RSS feeds are something of a challenge and not really addressed by current
I18N code - because they come from so many sites and may be expected by RSS newsreaders in so many (different) charsets, Unicode NCRs are probably the only viable format. Since RSS feeds are read-only that shouldn't be a problem. However, I'm sure this will get more complex on examination! Now that I have the code to convert from any native character set to/from Unicode, it should not be too hard to solve the RSS issue in the alpha releases.
If you have a public TWiki site, a URL would be very helpful so I can see all the issues, but I definitely need to see your TWiki.cfg and testenv output.
By the way, I keep getting bounce backs from your email address.
--
RichardDonkin - 21 Feb 2004
This was fixed by configuring TWiki.cfg to use the settings mentioned above.
--
RichardDonkin - 21 Mar 2004