PolishLanguageSetup < Codev

(Copied from ProposedUTF8SupportForI18N)

I would like to make a small remark. I installed last stable release in Win 2000 using Cygwin, and all was OK except Polish characters were displayed in the Edit inline box in Unicode, like ą I didn't find any solution in Support and Dev webs, and tried various options. Finally I replaced $ Site Charset ISO-etc. with UTF-8 in twiki\lib\Twiki.pm and now all is OK.

Please consider this small exchange.

-- AndrzejGoralczyk - 14 Feb 2004

Did you try setting the $siteLocale to something like pl_PL.ISO-8859-2? That's the correct character set for use of Eastern European characters including those used in Polish. UTF-8 doesn't really work that well at present, although work is going on as part of the Phase 2 UTF-8 plan - so you might find ISO-8859-2 works better right now. Conversion of non-ISO-8859-1 characters to numeric character references such as ą is what happens if you paste or type ISO-8859-2 characters when your browser page is ISO-8859-1 (the default for TWiki).

Even when Unicode is supported, using ISO-8859-* character sets will tend to give better sorting of Polish WikiWords in WebIndex until Phase 3 of the Unicode support is done (language-sensitive collation is a tough problem that is handled automatically by locale setups - Unicode doesn't work well with locales, so something else will need to be done.

Please post a Support question with your details, including the HTML output of latest testenv (CVSget:bin/testenv), and I should be able to resolve this. See SupportGuidelines for details.

I have now enhanced the installation guide to cover internationalisation setup, which should clarify this sort of issue. See the TWiki installation guide, step 4.

Also, note that CygWin has broken locales (testenv warns about this), so you'll need to set $localeRegexes to 0 and specify the required upper and lower case national characters - the installation guide talks about how to do this and there are comments in TWiki.cfg as well.

Let me know how you get on with this!

-- RichardDonkin - 16 Feb 2004

Thank You, Richard.

In fact, I tried various patterns for locale setup, and some other directions from perl documentation, and finally came to conclusion that locale istn't supported, the command locale -a doesn't work, and it is impossible to go out from C state of site locale. Consequently, came to TWiki.pm and set siteCharset, which is to use when locale isn't supported. I also set charset in ~~main template~~ correction: in TwikiPreferences accordingly, to UTF. This way I eliminated most painful garbage from edit window, albeit it is not full support of Polish language.

Of course, I can try ISO-8859-2, and also $localeRegexes and will let You know about effects.

-- AndrzejGoralczyk - 16 Feb 2004

I realised half-way through my comment that Cygwin is the main reason that I18N is not working, but the rest applies to anyone using TWiki for Polish on Linux/Unix - locales just don't work on Cygwin Perl (or ActiveState Perl). However, you can use ISO-8859-2 to ensure characters aren't changed into &261; type codes, and also $localeRegexes = 0 to make TWiki use the contents of the $upperNational and $lowerNational parameters. I tested this while developing the I18N code on Cygwin, so it should work OK, and will enable WikiWords with Polish accented characters.

From my Debian Linux box, with a working locale of pl_PL.ISO-8859-2, I've attached what Perl 5.6 thinks are valid alphanumeric characters - you should be able to set the $upperNational and $lowerNational parameters from this (view with ISO-8859-2 turned on in your browser/editor).

-- RichardDonkin - 16 Feb 2004

semi-related followup in BulgarianLanguageSetup

-- WillNorris - 16 Feb 2004

I have tested above. Changed $SiteCharset to ISO-8859-2 in TWiki.pm, and localeRegex to 0 in TWiki.cfg, and typed in $upperNationals and $lowerNationals. I also set ISO-8859-2 in TWikiPreferences (see correction above marked in red).

Results: Polish characters ("ogonki") are displayed well in the HTML page and in edit box. However, Polish characters are not supported in WikiNames. Also single quotation character is displayed incorrectly, as ' (U+001A, ESCAPE) in some RSS feeds (for example Scotsman UK News. To this end TWiki is working exactly as it worked with UTF-8.

The only difference I noticed is about British pound character �. In ISO-8859-2 it is displayed in HTML as Polish Ł, which is acceptable. In UTF-8, it is displayed as &#xA in Scotsman's feed (RSS v. 2.0), and as well known question mark in black square in BBC feeds (RSS v. 0.91).

-- AndrzejGoralczyk - 21 Feb 2004

Hi - I'm surprised that WikiWords / WikiNames are not working. Do you have $useLocale set to 0? If so, set it to 1.

Can you attach a test page (the .txt file), as well as your TWiki.cfg and testenv HTML output? WikiWords should definitely work in ISO-8859-2, even without a working locale - that's why I put in the $upperNational etc features.

As for the British pound character, this is not a character in ISO-8859-2, only in ISO-8859-1 - if you want to display this character you'll need to use £ (Unicode U+00A3) to display this. If there are many non-ISO-8859-2 characters that you need, you might want to try Unicode as long as you don't need all of TWiki's features to work (e.g. searching is broken at present) and are OK with updating to new alpha releases over the next few months.

One note - you should not need to set the character set in TWiki.pm, TWikiPreferences or the templates - leave the templates using %CHARSET%, so that the TWiki I18N code can set this based on the $siteLocale in TWiki.cfg. Even if you don't have a working locale you should still set the $siteLocale (and $useLocale = 1) for this reason. A sample config that should work in TWiki.cfg is:

$useLocale = 1;
$siteLocale = "pl_PL.ISO-8859-2";
$localeRegexes = 0;

Leave the $upperNational and $lowerNational parameters set to the Polish accented characters. This should then 'just work'!

RSS feeds are something of a challenge and not really addressed by current I18N code - because they come from so many sites and may be expected by RSS newsreaders in so many (different) charsets, Unicode NCRs are probably the only viable format. Since RSS feeds are read-only that shouldn't be a problem. However, I'm sure this will get more complex on examination! Now that I have the code to convert from any native character set to/from Unicode, it should not be too hard to solve the RSS issue in the alpha releases.

If you have a public TWiki site, a URL would be very helpful so I can see all the issues, but I definitely need to see your TWiki.cfg and testenv output.

By the way, I keep getting bounce backs from your email address.

-- RichardDonkin - 21 Feb 2004

This was fixed by configuring TWiki.cfg to use the settings mentioned above.

-- RichardDonkin - 21 Mar 2004

WebForm
TopicClassification	BugRejected
TopicSummary	Problems with Polish language support on Win2000 with CygWin
InterestedParties
AssignedTo
AssignedToCore	RichardDonkin
ScheduledFor
RelatedTopics	InternationalisationEnhancements
SpecProgress
ImplProgress
DocProgress

Attachments

Topic attachments
I	Attachment	History	Action	Size	Date	Who	Comment
gif	polish-chars-iso-8859-2.gif	r1	manage	3.1 K	2004-02-16 - 15:08	UnknownUser	Screenshot of accented characters (ISO-8859-2)
txt	polish-chars-iso-8859-2.txt	r1	manage	0.1 K	2004-02-16 - 15:02	UnknownUser	Accented characters (ISO-8859-2)

Topic revision: r7 - 2004-03-21 - RichardDonkin

Account
- Log In
- Register User

Edit
Attach

Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.