Tags:
internationalization2Add my vote for this tag localization2Add my vote for this tag create new tag
view all tags

User Interface Internationalisation

This document is targeted at developers (core and plugin code and user interface developers). If you are looking for instructions on configuring your TWiki to work with your local language, see InstallationWithI18N

Internationalisation (I18N) is a generalization process. It aims to make an application capable of interacting with the user using various languages, without hard-coded support for specific languages. This page covers a key aspect, of I18N, namely enabling the translation of text in the user interface into several languages.

This topic documents TWiki's user interface I18N, and presents guidelines to internationalising templates and TWiki code by ensuring that any English language text is extracted into a message catalogue that can then be easily extracted. Earlier I18N work on TWiki ensured that international characters work in WikiWords and URLs using 8-bit character sets, as described in InternationalisationEnhancements. The translation support described in the page has been available since TWikiRelease04x00x00 (DakarRelease).

See also:

System requirements for user interface I18N

UserInterfaceInternationalisation requires the following Perl modules:

  • CPAN:Locale::Maketext::Lexicon (debian apt: liblocale-maketext-lexicon-perl): module supporting PO files (among other formats) for storing translations
  • CPAN:Locale::Maketext (debian apt: liblocale-maketext-perl): module supporting internationalisation of user interface text
  • For Perl 5.8:
    • CPAN:Encode (part of Perl's core since 5.8) for converting strings in PO files (encoded in UTF-8) to {Site}{Charset}.
  • For Perl 5.6:

Once these modules are installed, UserInterfaceInternationalisation just works. There is no setting or preference that need to be configured in order to make it work. Note, however, that all translated text is stored in UTF-8 and is translated to the character encoding specified in {Site}{Locale} for display to the user, so you must set a {Site}{Locale}.

Support for non-English character encodings

All 8-bit character encodings are supported (e.g. ISO-8859-*, KOI8-R, etc), including non-Roman alphabets such as Cyrillic. This support is in production releases including TWikiRelease03Sep2004, but requires some patches highlighted in InternationalisationIssues. It works on any Perl version from Perl 5.005_03 upwards.

This support enables use of international characters in WikiWords, form field names, tables of contents, and so on. East Asian languages are supported as long as they use Unicode (UTF-8) - although this support does not include use of WikiWords (you must use explicit links to TWiki pages), it is quite usable, and there are many TWiki sites in Chinese, Japanese and probably other East Asian locales. See InternationalisationEnhancements for background on this work.

Language detection

Language is detected in the following way:

  • If the LANGUAGE variable is set (either as a preference or as a session variable), it's assumed to represent the desired language (setting it no a non-existing language causes a fallback to English). The "Change language" feature uses a session variable.
  • Otherwise, language is detected from the Accept-Language sent by the browser: the available language that has the highest priority to the user (as informed by the browser is used). In this case TWiki uses CPAN:Locale::Maketext's language detection.

TWiki Topics and templates I18N

See the %MAKETEXT{...}% variable in post-DakarRelease TWikiVariables topic.

tools/xgettext (see below) extracts translatable strings from topics shipped with TWiki.

User-created topics will be handled after DakarRelease.

Guidelines for internationalising TWiki topics and templates with %MAKETEXT{...}%

  • Don't ever put TWiki %VARIABLES% inside translatable strings. Write
    %MAKETEXT{"Attachments in topic [_1]", "%TOPIC%"}%
    instead of
    %MAKETEXT{"Attachments in topic %TOPIC%"}%
  • when the string is inside an HTML attribute, be sure the attribute is defined used single quotes to avoid confusion. Write
    <input type"submit" name="action" value='%MAKETEXT{"Save"}%'/>
    instead of
    <input type"submit" name="action" value="%MAKETEXT{"Save"}%"/>
  • Use \" for double quotes inside translatable strings: Example:
    %MAKETEXT{"Click on \"Save\" to record your changes."}%
  • Try hard to keep HTML out of translated strings, as this makes life harder for translators. (But sometimes, there is no way of doing it).
  • If you need to write something inside square brackets, escape it with tildes (this is a CPAN:Locale::Maketext restriction). Example:
    %MAKETEXT{"To save changes: Press the ~[Save Changes~] button."}%

General guidelines for user interface text I18N

There are some guidelines to be followed when internationalising an application. Try to follow all of them to make it easier to translate TWiki into local languages:

Use interpolation instead of concatenation

Instead of:

maketext("Found ") . $number . maketext(" items.")

Write:

maketext("Found [_1] items.", $number)

The same is valid in templates. Intead of:

%_{"This is the "}% %WEB %_{"web"}%

%_{"This is the %WEB% web"}%

When dealing with plurals, put them fully inside some context

Depending on the context, plurals can be translated differently in some languages.

Instead of:

maketext("Found") . $numbers) . ($number > 1)?maketext("items"):maketext("item")

Write:

($number > 1)?maketext("Found [_1] items"):translate("Found [_1] item")

In fact, the rules for inflecting (typically modifying the endings of) nouns in the presence of numbers can be very complicated, or not exist at all. CPAN:Locale::Maketext solves several of them for us, but for now just avoid plurals when you can. smile

Generating the PO files

Attention: this procedure assumes you are working with TWiki sources from svn.

tools/xgettext is a utility for extracting all strings inside TWiki's code, in Perl code, templates, and topics into a po/TWiki.pot file, which must be copied to create a new TWiki translation.

tools/xgettext requirements:

  • Locale::Maketext::Lexicon perl package
  • GNU gettext

To extract the strings, just run tools/xgettext from TWiki sources root:

[somebody@somehost:~/src/twiki]$ tools/xgettext

tools/xgettext will extract strings from all Perl source files, TWiki topics and templates listed in tools/MANIFESTand add the strings to po/TWiki.pot. If there is already a po/TWiki.pot file, the extracted strings are added into the existing po/TWiki.pot, i.e. your existing comments in po/TWiki.pot are preserved.

Extracted strings are also merged into existing translations. Translations and comments already done are preserved.

The merging process will try to guess similar sentences. This happens in two situations:

  • A string which was already translated in the PO file had a small change in source code. The old translation is kept.
  • A new string is somewhat similar to another one which was already translated. The translation of the older string is also used for the new string.

In both cases, the new strings will be marked as "fuzzy" to indicate that the string needs "human review". Translation maintainers have to check those strings and remove their fuzzy tags from the PO file, so TWiki knows that they are correctly translated.

Note: tools/xgettext is named after the utility with the same name provided by GNU gettext. TWiki's xgettext doesn't use GNU gettext's xgettext, it was written spefically for TWiki using the CPAN:Locale::Maketext module's Locale::Maketext::Extract module for handling translatable strings in TWiki templates and topics (CPAN:Locale::Maketext::Extract already handles Perl source code). Some GNU gettext utilities like msgmerge and msguniq are used in tools/xgettext, and msgfmt can be used for checking translations.

Extracted strings

tools/xgettext will extract strings basically in two forms:

  • %MAKETEXT{"My text" ...}% or %MAKETEXT{string="My text" ...}% :
    for regular user interface element translation.
  • $percntMAKETEXT{\"My text\" ...}$percnt or $percntMAKETEXT{string=\"My text\" ...}$percnt :
    for extracting MAKETEXT when used in %SEARCH{...}% formats. Note that this second form require strings to be escaped, since they are supposed to be inside an already double-quoted string, the format parameter for %SEARCH{...}%.

The actual work for extracting thr strings is done by the TWiki::I18N::Extract class.

References

  • Locale::Maketext::TPJ13 -- article about software localization (try perldoc Locale::Maketext::TPJ13 on your system).
  • Web Localization in Perl, by Autrijus Tang (adjust your browser to UTF-8, if needed)

-- AntonioTerceiro - 14 Jan 2006


Discussion

Excellent to see this work progressing! It would be useful to have some reference to the existing InternationalisationEnhancements, which mainly focus on correct handling of TWiki pages and WikiWords in various languages. This UserInterfaceInternationalisation page is really covering internationalisation support for message text, which is a step beyond the existing work.

-- RichardDonkin - 11 Sep 2005

I've modified the intro a bit to reflect the fact that this page covers one part of I18N, not the whole problem. Also, I would like to see this page renamed since it covers only internationalisation of message text in the user interface, which is not the whole of I18N of course - how about UserInterfaceInternationalisation or MessageTextInternationalisation? The English-style spelling of Internationalisation is already quite prevalent in TWiki page names (yes, I am English smile ) so we should either change those over to US spelling or change this page to use the 'isation' ending.

Re the Perl code I18N section, I've inserted a reference to the existing InternationalisationGuidelines which cover I18N (in the non-message-text sense) of core and plugin code. This section should be merged into that page once it's matured a bit.

Another issue is that we seem to have four separate pages covering localisation framework activity... Some refactoring would be good.

-- RichardDonkin - 18 Sep 2005

Not sure why Perl 5.6 doesn't work for character encoding conversion - see my comment on Bug 482 for a bit more.

Also renamed this topic!

-- RichardDonkin - 26 Sep 2005

Hi, Richard. Thank you for you comment. See my comment on Bug 482.

-- AntonioTerceiro - 27 Sep 2005

Antonio - just to keep hassling you about Perl versions, I think that CPAN:Unicode::MapUTF8 should work on Perl 5.5 (5.005_03) as well as 5.6... Of course, if we decide to change the general TWiki TWikiSystemRequirements that might be OK - not sure how common 5.5 is these days.

-- RichardDonkin - 03 Oct 2005

tools/xgettext does not run on Mac OS X 10.4. I get this feedback:

I: scanning sources, it may take some time...
Can't call method "extract_file" on an undefined value at ./tools/xgettext line 47.

  • This looks like CPAN:Locale::Maketext::Lexicon is missing? -- SteffenPoulsen - 19 Oct 2005

-- ArthurClemens - 18 Oct 2005

Arthur - could you provide testenv output as per SupportGuidelines, and create a bug entry over on the Develop Branch TWiki?

-- RichardDonkin - 19 Oct 2005

Just made a general review of this document.

-- AntonioTerceiro - 14 Jan 2006

Unfortunately in my installation the UserInterfaceLocalisation didn't "just work". Finally I found out, that in addition to the Locale::Maketext also the I18N::LangTags Modul is required. Now everything works fine.

-- BertShome - 19 Mar 2006

I18N::LangTags is a Locale::Maketext requirement. And in Perl 5.8, AFAICT, both modules are part of the Perl core.

-- AntonioTerceiro - 21 Mar 2006

Added a link to the updated InternationalisationGuidelines and a few other minor edits. Also added this note above, which should help avoid people mis-configuring TWiki for I18N - already in InstallationWithI18N.

  • NOTE: It is incorrect to set {Site}{Charset} directly in configure, in 99% of all cases - the correct approach is to set {Site}{Locale}, which includes the character encoding on the end (e.g. .iso-8859-1), and also sets the 'locale' needed for other purposes such as WikiWord I18N. The only time you should set the {Site}{Charset} is to override the character encoding when (1) the locale specifies character encoding X and (2) the web browser will only accept a different spelling of X, e.g. iso8859-9 vs iso-8859-9 or Latin-9 (this is a made-up example only but it can occasionally happen).

-- RichardDonkin - 10 Nov 2006

The tail of discussion has been moved to ProperLanguageSwitching

Edit | Attach | Watch | Print version | History: r45 < r44 < r43 < r42 < r41 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r45 - 2008-05-31 - CrawfordCurrie
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.