User Interface Internationalisation
This document is targeted at developers (core and plugin code and user interface developers). If you are looking for instructions on configuring your TWiki to work with your local language, see
InstallationWithI18N
Internationalisation (
I18N) is a generalization process. It aims to make an application
capable of interacting with the user using various languages, without hard-coded support for specific languages. This page covers a key aspect, of
I18N, namely enabling the translation of text in the user interface into several languages.
This topic documents TWiki's user interface
I18N
, and presents guidelines to internationalising
templates and TWiki code by ensuring that any English language text is extracted into a
message catalogue that can then be easily extracted.
Earlier
I18N work on TWiki ensured that international characters work in
WikiWords and URLs using 8-bit character sets, as described in
InternationalisationEnhancements. The translation support described in the page has been available since
TWikiRelease04x00x00 (
DakarRelease).
See also:
System requirements for user interface I18N
UserInterfaceInternationalisation requires the following Perl modules:
- CPAN:Locale::Maketext::Lexicon (debian apt: liblocale-maketext-lexicon-perl): module supporting PO files (among other formats) for storing translations
- CPAN:Locale::Maketext (debian apt: liblocale-maketext-perl): module supporting internationalisation of user interface text
- For Perl 5.8:
- CPAN:Encode (part of Perl's core since 5.8) for converting strings in PO files (encoded in UTF-8) to
{Site}{Charset}
.
- For Perl 5.6:
Once these modules are installed,
UserInterfaceInternationalisation just works. There is no setting
or preference that need to be configured in order to make it work.
Note, however, that
all translated text is stored in UTF-8 and is translated to the character encoding specified in
{Site}{Locale}
for display to the user,
so you
must set a
{Site}{Locale}
.
Support for non-English character encodings
All 8-bit character encodings are supported (e.g. ISO-8859-*, KOI8-R, etc), including non-Roman alphabets such as Cyrillic. This support is in production releases including
TWikiRelease03Sep2004, but requires some patches highlighted in
InternationalisationIssues. It works on any Perl version from Perl 5.005_03 upwards.
This support enables use of international characters in
WikiWords, form field names, tables of contents, and so on. East Asian languages are supported as long as they use Unicode (UTF-8) - although this support does not include use of
WikiWords (you must use explicit links to TWiki pages), it is quite usable, and there are many TWiki sites in Chinese, Japanese and probably other East Asian locales. See
InternationalisationEnhancements for background on this work.
Language detection
Language is detected in the following way:
- If the LANGUAGE variable is set (either as a preference or as a session variable), it's assumed to represent the desired language (setting it no a non-existing language causes a fallback to English). The "Change language" feature uses a session variable.
- Otherwise, language is detected from the
Accept-Language
sent by the browser: the available language that has the highest priority to the user (as informed by the browser is used). In this case TWiki uses CPAN:Locale::Maketext's language detection.
TWiki Topics and templates I18N
See the
%MAKETEXT{...}%
variable in post-DakarRelease TWikiVariables topic.
tools/xgettext
(see below) extracts translatable strings from topics shipped with TWiki.
User-created topics will be handled after
DakarRelease.
Guidelines for internationalising TWiki topics and templates with %MAKETEXT{...}%
- Don't ever put TWiki
%VARIABLES%
inside translatable strings. Write
%MAKETEXT{"Attachments in topic [_1]", "%TOPIC%"}%
instead of
%MAKETEXT{"Attachments in topic %TOPIC%"}%
- when the string is inside an HTML attribute, be sure the attribute is defined used single quotes to avoid confusion. Write
<input type"submit" name="action" value='%MAKETEXT{"Save"}%'/>
instead of
<input type"submit" name="action" value="%MAKETEXT{"Save"}%"/>
- Use
\"
for double quotes inside translatable strings: Example:
%MAKETEXT{"Click on \"Save\" to record your changes."}%
- Try hard to keep HTML out of translated strings, as this makes life harder for translators. (But sometimes, there is no way of doing it).
- If you need to write something inside square brackets, escape it with tildes (this is a CPAN:Locale::Maketext restriction). Example:
%MAKETEXT{"To save changes: Press the ~[Save Changes~] button."}%
General guidelines for user interface text I18N
There are some guidelines to be followed when internationalising an application.
Try to follow all of them to make it easier to translate TWiki into local languages:
Use interpolation instead of concatenation
Instead of:
maketext("Found ") . $number . maketext(" items.")
Write:
maketext("Found [_1] items.", $number)
The same is valid in templates. Intead of:
%_{"This is the "}% %WEB %_{"web"}%
%_{"This is the %WEB% web"}%
When dealing with plurals, put them fully inside some context
Depending on the context, plurals can be translated differently in some languages.
Instead of:
maketext("Found") . $numbers) . ($number > 1)?maketext("items"):maketext("item")
Write:
($number > 1)?maketext("Found [_1] items"):translate("Found [_1] item")
In fact, the rules for inflecting (typically modifying the endings of) nouns in the presence of numbers can be very
complicated, or not exist at all.
CPAN:Locale::Maketext solves several of
them for us, but for now just avoid plurals when you can.
Generating the PO files
Attention: this procedure assumes you are working with TWiki sources from
svn.
tools/xgettext
is a utility for extracting all strings inside TWiki's code,
in Perl code, templates, and topics into a
po/TWiki.pot
file, which must
be copied to create a new TWiki translation.
tools/xgettext
requirements:
- Locale::Maketext::Lexicon perl package
- GNU gettext
To extract the strings,
just run
tools/xgettext
from TWiki sources root:
[somebody@somehost:~/src/twiki]$ tools/xgettext
tools/xgettext
will extract strings from all Perl source files, TWiki topics and
templates listed in tools/MANIFESTand add the strings to
po/TWiki.pot
.
If there is already a
po/TWiki.pot
file, the extracted strings are added into the
existing
po/TWiki.pot
, i.e.
your existing comments in po/TWiki.pot
are preserved.
Extracted strings are also merged into existing translations.
Translations and comments already done are preserved.
The merging process will try to guess similar sentences. This happens in two situations:
- A string which was already translated in the PO file had a small change in source code. The old translation is kept.
- A new string is somewhat similar to another one which was already translated. The translation of the older string is also used for the new string.
In both cases, the new strings will be marked as "fuzzy" to indicate that the string
needs "human review". Translation maintainers have to check those strings and remove
their fuzzy tags from the PO file, so TWiki knows that they are correctly translated.
Note: tools/xgettext
is named after the utility with the same name provided
by
GNU gettext. TWiki's
xgettext
doesn't use GNU gettext's
xgettext
, it was written spefically for TWiki
using the
CPAN:Locale::Maketext module's Locale::Maketext::Extract module
for handling translatable strings in TWiki templates and topics
(
CPAN:Locale::Maketext::Extract
already handles Perl source code).
Some GNU gettext utilities like
msgmerge
and
msguniq
are used in
tools/xgettext
,
and
msgfmt
can be used for
checking translations.
Extracted strings
tools/xgettext
will extract strings basically in two forms:
-
%MAKETEXT{"My text" ...}%
or %MAKETEXT{string="My text" ...}%
:
for regular user interface element translation.
-
$percntMAKETEXT{\"My text\" ...}$percnt
or $percntMAKETEXT{string=\"My text\" ...}$percnt
:
for extracting MAKETEXT when used in %SEARCH{...}%
formats. Note that this second form require strings to be escaped, since they are supposed to be inside an already double-quoted string, the format
parameter for %SEARCH{...}%
.
The actual work for extracting thr strings is done by the
TWiki::I18N::Extract
class.
References
-
Locale::Maketext::TPJ13
-- article about software localization (try perldoc Locale::Maketext::TPJ13
on your system).
- Web Localization in Perl, by Autrijus Tang (adjust your browser to UTF-8, if needed)
--
AntonioTerceiro - 14 Jan 2006
Discussion
Excellent to see this work progressing! It would be useful to have some reference to the existing
InternationalisationEnhancements, which mainly focus on correct handling of TWiki pages and
WikiWords in various languages. This
UserInterfaceInternationalisation page is really covering internationalisation support for message text, which is a step beyond the existing work.
--
RichardDonkin - 11 Sep 2005
I've modified the intro a bit to reflect the fact that this page covers one part of
I18N, not the whole problem. Also, I would like to see this page renamed since it covers only internationalisation of message text in the user interface, which is not the whole of
I18N of course - how about
UserInterfaceInternationalisation or
MessageTextInternationalisation? The English-style spelling of Internationalisation is already quite prevalent in TWiki page names (yes, I am English
) so we should either change those over to US spelling or change this page to use the 'isation' ending.
Re the Perl code
I18N section, I've inserted a reference to the existing
InternationalisationGuidelines which cover
I18N (in the non-message-text sense) of core and plugin code. This section should be merged into that page once it's matured a bit.
Another issue is that we seem to have four separate pages covering localisation framework activity... Some refactoring would be good.
--
RichardDonkin - 18 Sep 2005
Not sure why Perl 5.6 doesn't work for character encoding conversion - see my comment on
Bug 482 for a bit more.
Also renamed this topic!
--
RichardDonkin - 26 Sep 2005
Hi, Richard. Thank you for you comment.
See
my comment on Bug 482.
--
AntonioTerceiro - 27 Sep 2005
Antonio - just to keep hassling you about Perl versions, I think that
CPAN:Unicode::MapUTF8 should work on Perl 5.5 (5.005_03) as well as 5.6... Of course, if we decide to change the general TWiki
TWikiSystemRequirements that might be OK - not sure how common 5.5 is these days.
--
RichardDonkin - 03 Oct 2005
tools/xgettext
does not run on Mac OS X 10.4. I get this feedback:
I: scanning sources, it may take some time...
Can't call method "extract_file" on an undefined value at ./tools/xgettext line 47.
- This looks like
CPAN:Locale::Maketext::Lexicon
is missing? -- SteffenPoulsen - 19 Oct 2005
--
ArthurClemens - 18 Oct 2005
Arthur - could you provide testenv output as per
SupportGuidelines, and create a bug entry over on the
Develop Branch TWiki?
--
RichardDonkin - 19 Oct 2005
Just made a general review of this document.
--
AntonioTerceiro - 14 Jan 2006
Unfortunately in my installation the
UserInterfaceLocalisation didn't "just work". Finally I found out, that in addition to the Locale::Maketext also the
I18N::LangTags Modul is required. Now everything works fine.
--
BertShome - 19 Mar 2006
I18N::LangTags
is a
Locale::Maketext
requirement. And in Perl 5.8, AFAICT, both modules are part of the Perl core.
--
AntonioTerceiro - 21 Mar 2006
Added a link to the updated
InternationalisationGuidelines and a few other minor edits. Also added this note above, which should help avoid people mis-configuring TWiki for
I18N - already in
InstallationWithI18N.
- NOTE: It is incorrect to set
{Site}{Charset}
directly in configure
, in 99% of all cases - the correct approach is to set {Site}{Locale}
, which includes the character encoding on the end (e.g. .iso-8859-1
), and also sets the 'locale' needed for other purposes such as WikiWord I18N. The only time you should set the {Site}{Charset}
is to override the character encoding when (1) the locale specifies character encoding X and (2) the web browser will only accept a different spelling of X, e.g. iso8859-9
vs iso-8859-9
or Latin-9
(this is a made-up example only but it can occasionally happen).
--
RichardDonkin - 10 Nov 2006
The tail of discussion has been moved to
ProperLanguageSwitching