Slogans borrowed from the Mozilla I18N project
For help in updating plugins or core code for Internationalisation, see InternationalisationGuidelines () This page is a gateway and discussion point for developers working on I18N. It is mainly a collection of resources useful to such developers. Related pages: InternationalisationDiscuss, InternationalisationIssues, UnicodeSupport (), InternationalisationUTF8, ProposedUTF8SupportForI18N, EncodeURLsWithUTF8, CyrillicSupport, JapaneseAndChineseSupport, UserInterfaceInternationalisation, UserInterfaceLocalisation, BiDirectionalText ()grep
will be necessary for searching to work properly - GNU grep
works fine and is available on virtually any platform.
Use of locales is controlled by a configure
setting, and the locale is site-wide for simplicity. More complex setup of locales may be possible in future, but there are security issues with allowing web users to set their own locale variables.
about:config
into the URL bar
utf
into the filter field that appears
network.standard-url.encode-utf8
line so that it says true
network.standard-url.escape-utf8
line so that it says true
[[WikiWord]]
type links.
[[Sandbox.LaLangueFrançaise][LaLangueFrançaise]]
%CHARSET%
variable with any browser. The standard TWiki templates are now fixed in the TWikiAlphaRelease to work with I18N web names and WikiWords using Mozilla - this is because Mozilla decides to UTF8-encode URLs if they are used as a form submission URL, even though the whole page is in ISO-8859-1 mode and other URLs are never encoded...
To make any skin work with the new I18N support, some simple changes are needed to any form submission URLs: <form>
elements in your skin templates - e.g. grep -i '<form' *.tmpl
under Unix/Linux/CygWin.
<form
tag, and always part of the action="http://foo"
attribute) so that the variables %WEB%
, %BASEWEB%
, %INCLUDINGWEB%
and %TOPIC%
are properly URL encoded. For example, to URL encode .../%WEB%/%TOPIC%
write .../%INTURLENCODE{"%WEB%/%TOPIC%"}%
. =, not =
- this helps to ensure that your skin will work smoothly in the future, when TWiki eventually supports UTF8 throughout.
%CHARSET%
variable instead of iso-8859-1
:
<head> ... <meta http-equiv="Content-Type" content="text/html; charset=%CHARSET%" /> </head>Skins for TWikiSyndication should use names of the form 'rss*' - this ensures that the TWiki code knows it is handling RSS data, which requires I18N characters (i.e. with 8th bit set) to be encoded as
&nnn;
sequences. &1562;
are drawn from the Unicode (ISO 10646-1) character set, whose first 255 codepoints are the same as ISO-8859-1. These entities always refer to the same character, regardless of the document's character encoding, according to the HTML 4.0 spec.
%SEARCH%
into (Italian example) %CERCA%
=?iso-8859-1?q?=20Smith?=
libiconv
NOTES file - good overview of the character codings actually used world-wide
http-equiv
is useful for offline browsing of HTML documents; when online, HTTP headers take precedence.
http-equiv
is used to set charset
use locale
. Doing $^H
at compile time works, but not really necessary for TWiki.
\w
regex in the fr_FR.ISO8859-1
locale matches '-' as well as '_', which is a minor issue, while on CygWin there is no locale support at all, and on ActivePerl, uppercasing a character can lead to a completely different and even non-alphabetic character! In Perl 5.8 on another Debian system, using the locale fr_FR.UTF8
meant that the collation order was as for ASCII, and a Japanese (Kanji) character was included in the set of alphabetic characters...
This means that workarounds will be essential for many people, so this code will make it easy to avoid using any locale functions if $useLocale
is turned off - basically, this will involve typing a list of upper and lower case non-ASCII national characters into TWiki.cfg
variable settings. This will help with features handled entirely by TWiki, such as WikiWords, but won't address external programs, for which the only solution is to report the bugs to whoever maintains them, or perhaps install different versions of such programs.
UPDATE: I've coded most of this - all the basic link types are working, apart from anchors and upper casing in spaced-out WikiWords. There's a test page up at http://donkin.org/bin/view/Test/TestTopic5 running on this code - not yet in TWikiAlphaRelease as I'd like to test it a bit more, but it seems to work OK. It's been tested in no-locale mode only so far, so will work on broken locales. I really need Perl 5.6 on a system with working locales to test this - will probably have to install Perl 5.6 on Debian.
-- RichardDonkin - 26 Nov 2002
I've now got sorting of the WikiWords in WebIndex working - turns out that ls
on my Debian is locale-unaware, but TWiki sorts the output anyway in Perl, so it works with only a five line change to Search.pm. Locales are also working fine under Perl 5.005_03.
-- RichardDonkin - 29 Nov 2002
Now in TWikiAlphaRelease - please test this out and log any bugs! It's quite easy to set up if you have a working locale on your system. Be sure to review #Browser_setup for a simple browser config change required for this to work.
-- RichardDonkin - 30 Nov 2002
More links about what other Wikis are doing in this area - PhpWiki is quite a way ahead, in that it actually ships with translated pages for several languages and already supports PhpWiki:DoubleByteCharacters. MoinMoin also ships with translated pages and has Unicode character support.
-- RichardDonkin - 02 Dec 2002
Now released as part of TWikiRelease01Feb2003 and running on TWiki.org (with I18N turned off).
(Discussion refactored to InternationalisationDiscuss; any bugs should be reported via BugReports as normal, and linked from InternationalisationIssues as well.)
-- RichardDonkin - 16 Feb 2003
configure
interface. Check out lib/TWiki.cfg
-- AntonioTerceiro - 04 Nov 2005
No, I mean, it was just renamed to {UserInterfaceInternationalisation}
(in SVN).
-- AntonioTerceiro - 06 Nov 2005
Does working (i18n) code exist for capitalizing wiki word
to WikiWord
?
-- ArthurClemens - 22 Mar 2006
There's some code in SVN:TWiki/Render.pm that looks like this - it will work with I18N as long as locales are properly set up, but it probably won't work in 'locale regexes off' mode:
# Turn spaced-out names into WikiWords - upper case first letter of # whole link, and first of each word. TODO: Try to turn this off, # avoiding spaces being stripped elsewhere $theTopic =~ s/^(.)/\U$1/; $theTopic =~ s/\s([$TWiki::regex{mixedAlphaNum}])/\U$1/go;So this is something of an I18N bug - requires code that uses
upperNational
and lowerNational
to do upper-casing, which is not trivial since some lower case letters don't exist as upper case (e.g. German ß
). Probably not worth fixing unless someone has this issue and the time to fix it.
-- RichardDonkin - 30 Mar 2006
I installed the twiki DakarRelease. But I found that the Chinese topic title can not display correctly. Moreover, it make the page format wrong. I copy the page TWikiQickStart on stlchina from http://www.stlchina.org (a chinese twiki site). But it cannot display the same thing as it on stlchina. Please check the attached file for detail.
-- ZhengLingxiang - 05 Apr 2006
It's best if you create a new support request under the Support web. See SupportGuidelines on how to do this.
Your raw.txt
attachment is quite interesting - it is using either GB2312 or GBK character encoding. Neither of these is supported by TWiki (see JapaneseAndChineseSupport for details) since there are some Chinese characters that include ASCII characters that are processed (parsed) by TWiki (e.g. [
), which will cause your page text to be displayed incorrectly.
From your configure.htm
output, it seems you are using UTF-8, which explains why pasting in text in GBK didn't work.
-- RichardDonkin - 05 Apr 2006
The text is save in utf-8 format in the wiki page. If I just perview the topic when edit, all thing works fine. But after I saved it, the page cannot bed displayed properly. The raw.txt in GBK, just because I save in this format.
-- ZhengLingxiang - 05 Apr 2006
I'll need more information to help further - the exact error case you are seeing needs to be clearly explained. I don't read Chinese, so please be very specific as to exactly which characters don't work. SupportGuidelines is a good place to start.
-- RichardDonkin - 05 Apr 2006
I do some more test and create a new support page ChineseHeadlineBrokenPageFormat
-- ZhengLingxiang - 06 Apr 2006
As far as I see it site lang doesnt get used. I changed line 141 to:
my $userLanguage = _normalize_language_tag($session->{prefs}->getPreferencesValue('LANGUAGE')) | $TWiki::cfg{Site}{Lang};
now it will use the site lang if there is no user pref
-- AdamHyde - 08 May 2008
Correct - and it has been removed.
-- CrawfordCurrie - 31 May 2008
The Lang (more recently Site Lang) was intended for future use when we eventually supported multiple languages, but this was never implemented.
-- RichardDonkin - 14 Jun 2008
I | Attachment | History | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|---|
zip | configure.html.zip | r1 | manage | 29.4 K | 2006-04-05 - 08:40 | UnknownUser | twiki configure of my server |
jpg | format_wrong.JPG | r1 | manage | 200.0 K | 2006-04-05 - 08:44 | UnknownUser | wrong page display |
txt | raw.txt | r1 | manage | 5.5 K | 2006-04-05 - 08:44 | UnknownUser | raw text of the chinese version TWikiQickStart on stlchina |
txt | testpage.txt | r1 | manage | 1.2 K | 2002-12-03 - 13:27 | UnknownUser | Test page for i18n (ISO-8859-1) |