Tags:
installation1Add my vote for this tag internationalization2Add my vote for this tag create new tag
view all tags
ALERT! NOTE: This is a SupplementalDocument topic which is not included with the official TWiki distribution. Please help maintain high quality documentation by fixing any errors or incomplete content. Put questions and suggestions concerning the documentation of this topic in the comments section below! Use the Support web for problems you are having using TWiki.

Installing TWiki with Internationalisation support (I18N)

Introduction

This document is useful if you need to use languages other than English for the content of your TWiki site, or in other words, you must support international characters in TWiki pages and WikiWords, or would like a translated user interface. This is generally known as Internationalisation (or I18N, from 'I' followed by 18 letters, then 'N').

There are two types of TWiki I18N:

  • Content Internationalisation, which enables the use of international character sets in TWiki pages and WikiWords - this is described in TWiki:Codev.InternationalisationEnhancements (which also provides an overview of TWiki I18N).
  • User Interface Internationalisation, which enables the TWiki user interface to be translated (localised), with some built-in translations to a range of languages - this is described in UserInterfaceInternationalisation and was added in TWiki 4.0.
The details on this page apply to TWiki 4.0 or higher specifically. Setup for earlier versions is similar, but involves editing .cfg files.

Key points to note about which locale and character set (encoding) to use:

  • Once you choose an encoding and start to generate pages, it is very difficult to change to a different encoding. So it's very important you make the right choice when you first install TWiki.
  • If you use an alphabetic language (whether with Roman alphabet as in French or Cyrillic alphabet as in Russian), WikiWords using all your language's special characters will work. See CyrillicSupport for examples of Cyrillic usage. Generally, you will be better off using a single-byte character set such as ISO-8859-*, KOI8-R, etc, rather than UTF-8. This is because the support for multibyte character sets in TWiki is not complete, and you may run into issues using it.
  • If you use a non-alphabetic language, e.g. most East Asian languages including ideogrammatic languages such as Chinese and Japanese, you will need to use UTF-8 or EUC-*. TWiki support for UTF-8 is somewhat limited, but is quite usable - in particular, only English WikiWords will work. Since most East Asian sites don't require alphabetic languages other than English, this is not a significant limitation. See JapaneseAndChineseSupport for examples of Japanese and Chinese usage, and be sure to use UTF-8 or EUC-*, not GB2312, Shift-JIS or other encodings!
  • If you want to use WYSIWYG editing with international character sets, you must update your versions of WysiwygPlugin and TinyMCEPlugin from the TWiki.org Plugins site. Older versions of these plugins do not work with encodings other than iso-8859-1.
For a detailed technical discussion of what character encodings are, see UnderstandingEncodings.

Internationalisation Setup

By default, TWiki is configured to support ASCII letters (English language with no accents) in WikiWords, and ISO-8859-1 (Western European) characters in page contents. If that's OK for you, just ignore this page.

If your TWiki site will be used by non-English speakers, TWiki can support suitable accented characters in WikiWords (as well as languages such as Japanese or Chinese in which WikiWords do not apply), and will support virtually any character set in the contents of pages. In addition, searching and sorting will work properly with I18N data, i.e. you can include accented characters such as ü when searching, and WebIndex will show you TWiki topics ordered using the correct collation sequence for your locale.

To configure internationalisation support:

  1. Run the configure tool and open the Localisation section
  2. For user interface internationalisation, enable {UserInterfaceInternationalisation}, and select the languages for which you require a translated user interface
  3. For content internationalisation, enable {UseLocale} by ticking its checkbox. TWiki will now use the I18N parameters set in the rest of this section.
  4. Log on to your TWiki server using SSH or Telnet, and type the Unix/Linux command locale -a to find a suitable 'locale' for your use of TWiki. (If you are on Windows, you won't be able to use locales, see below.) A locale that includes a dot followed by a character set is recommended, e.g. pl_PL.ISO-8859-2 for Poland. Consult your server system administrator if you are not sure which locale to use, and see the section on installing locales if there is no suitable locale available.
  5. In configure, set the {Site}{Locale} parameter to your chosen locale, e.g. pl_PL.ISO-8859-2 for Poland.
  6. Set the {Site}{Charset} to the same character set used in the locale, e.g. ISO-8859-2 for Poland as above.
    • If you don't set the {Site}{CharSet} TWiki 4.x and later will default to generating pages using iso-8859-1
    • The only time you should set the {Site}{Charset} to something different from the locale is to override the character encoding when (1) the locale specifies character encoding X and (2) the web browser will only accept a different spelling of X, e.g. iso8859-9 vs iso-8859-9 or Latin-9 (this is a made-up example only, but it can occasionally happen)
  7. Save your settings, and go back to configure
  8. Check your I18N setup by reviewing all parts of the Localisation section of configure for warnings - this provides some diagnostics for I18N setup, and in particular checks that your chosen locale can be used successfully.
  9. Always check your web browser is using the right encoding to view your non-English pages, especially when you are editing a page whose name is a non-English WikiWord. For example, the browser should show ISO-8859-2 or the equivalent as its Encoding setting, given the configure setting of {Site}{Locale} above.
    • In InternetExplorer 6 & 7, use Tools - Internet Options - General - Languages
    • In Firefox 2, use View - Character Encoding. Firefox is a good browser for experimenting with I18N since it includes a wide range of international character fonts 'out of the box'.
  10. Try out your TWiki by creating pages in the Sandbox web that use international characters in WikiWords and checking that searching, WebIndex, Backlinks and other features are working OK.

Adding locales

On some Linux/Unix systems, you may want to add new locales. The examples below are for ISO-8859-1 in France - you can modify these as you require. Please add details for your Linux distribution or Unix system below, as a new section.

Adding locales on Ubuntu and Debian

Since Ubuntu and Debian don't generate all locales at install time, you will need to do this.

Debian

  • Use a suitable editor (e.g. nano) to edit /etc/locale.gen - add following line at end, for whatever locale you require:
             fr_FR.ISO-8859-1 ISO-8859-1
  • Type command locale-gen --keep-existing to generate these new locales while keeping existing locales
  • Check the locales were created by typing locale -a
  • Now run TWiki's configure tool to use this locale and check it can be used OK.

Ubuntu

sudo locale-gen de_DE@euro

A list of possible values (instead of de_de@euro) can be looked up in the file /usr/share/i18n/SUPPORTED.

Then sudo dpkg-reconfigure locales

The result can be checked with locale -a

Adding locales on other systems

Add details for new systems here, creating a new section where needed.

Trouble with I18N?

IDEA! Locale is set but I18N doesn't work: If international characters in WikiWords do not seem to work despite setting the locale (e.g. you are on Windows), you can still use I18N with some features disabled, no matter what Perl version is installed:

  1. Keep {Site}{CharSet} set in configure - note that the {Site}{CharSet} setting is still used to set the browser character encoding even though you will not be using the Perl locales feature.
  2. Disable {LocaleRegexes} - this disables some features (specifically, sorting and searching using accented characters), but enables basic WikiWord I18N to work even if your system has locales that do not work.
  3. Set the {UpperNational} and {LowerNational} parameters to a string containing the valid upper and lower case non-ASCII letters for your locale, e.g. 'äë...'. For Western European languages using Roman alphabet, this means accented characters. For Cyrillic and other non-Roman alphabets, this means the entire alphabet. If you are using ISO-8859-1, here are some settings that may help:
    • {UpperNational}: AAAAÄÅÆÇEÉEEIIIIDÑOOOOÖOUUUÜ
    • {LowerNational}: àáâaäåæçèéêëìíîïdñòóôoöoùúûüß
IDEA! Windows systems: You must use the above 'localeRegexes = 0' workaround (whether using Cygwin or ActiveState Perl), since Perl locales are not working on Windows as of TWiki 4.0.
  • NOTE: If you are using ActivePerl, make sure that you installed the modules Encode::compat and Unicode::UTF8simple. They might not be installed by default, so you have to add them through the repository manager. Not installing these module might cause strange characters when using e.g. the CommentPlugin.
  • NOTE: If you have sufficient system resources on your Windows server, you are recommended to investigate using TWiki:Codev.TWikiVMDebianStable - this is very easy to install, and avoids the problems with Perl using Windows locales. It is the only way to get a fully working I18N installation of TWiki on Windows, including features such as proper searching and sorting of WikiWords with accented characters.
IDEA! Perl 5.005 systems: If international characters in WikiWords aren't working, and you are on Perl 5.005 with working locales, use the 'localeRegexes = 0' workaround above - configure should generate the lists of characters for you, in which case just copy them into the relevant TWiki.cfg settings.

IDEA! Setting CHARSET is wrong: If you set the CHARSET parameter in TWikiPreferences, that is a mistake - please unset it. The TWiki site character set is taken from the {Site}{CharSet} setting in TWiki.cfg, as mentioned in the setup steps above. There is no need to edit TWikiPreferences when configuring I18N.

IDEA! Other troubleshooting ideas: See the comments in configure for more information and help in setting up I18N.

Support questions about I18N

Some tips to help you get your I18N issues resolved when raising questions in the Support web:

  1. Read SupportGuidelines and provide all relevant information. In configure, be sure to do 'expand all options' and then save the page as an HTML file - then attach this file to the support question page you're creating. It's much easier to read the configure output if it's in HTML format not plain text.
  2. Provide a simplified example that shows exactly what the problem is - screenshots are a really good idea, with some text or graphics to highlight the error.
    • Remember that the person helping you with your support question may not speak the language your site is using, or even be able to read the characters in the case of Chinese, Japanese or most non-European languages - simplified examples with highlighted errors are important for this reason.
  3. Upload the raw text of the topic (i.e. the topic's .txt file) for your simplified example, as an attachment - it's important to do this, so that the characters are not translated into ISO-8859-1, the character encoding used by TWiki. (You can ignore this if you're using ISO-8859-1, but it's still useful to have this attachment).

Upgrading from 01 Feb 2003 release?

If you were using TWiki:Codev.TWikiRelease01Feb2003 support for I18N, with Internet Explorer or Opera, all users should reconfigure their browser so that it sends URLs encoded with UTF-8 (supported since TWiki:Codev.TWikiRelease01Sep2004). For most browsers, this is the default, though TWiki also works with browsers that don't use UTF-8 encoded URLs.

  • Internet Explorer 5.0 or higher: in Tools | Options | Advanced, check 'always send URLs as UTF-8', then close all IE windows and restart IE.
  • Opera 6.x or higher: in Preferences | Network | International Web Addresses, check 'encode all addresses with UTF-8'.
NOTE: This support for UTF-8 in URLs does not mean that TWiki fully supports UTF-8 as a site character set. However, it is fine to use UTF-8 if you need to support East Asian languages such as Chinese or Japanese.

Contributors: RichardDonkin, GerhardHeeke, CrawfordCurrie



Comments & Questions about this Supplemental Document Topic

Older comments refactored into the text or deleted by CrawfordCurrie - 31 May 2008. See revision 25 of this page for the refactored content.


Just to give you some additional feedback:

Following this manual and the additional comments, I configured I18N support on Ubuntu 8.04 for a Spanish environment. It works as far as I can see, BUT by the time a new page with national characters like á is saved, the corresponding file in tha data path shows a txt-filename containing invalid characters. I still have not figured out very clearly this utf, iso and conversion thing, but I suppose this has to do with the fact that Ubuntu is using utf8 as system locale. But at least Wiki-linking works.

Little side-effect: I am trying to migrate from a Windows-TWiki to Ubuntu and at this moment this filename issue results to be a real problem I have not solved yet. The pages show up fine, but all links coming from Windows are broken. I filed a support-page (see MigrationFromWindowsToUbuntu), so if anything has experience on this issue, his/her help is really appreciated. Thanks.

-- SebastianKlus - 29 May 2008

Does anyone know if the statement above regarding perl locales on Windows still holds? Has anyone ever tried to I18N a Windows server?

-- CrawfordCurrie - 31 May 2008

Yes, it does. I have TWiki installed on a Windows Server and you still have to disable LocaleRegexes. In my case, SiteLocale has to be Spanish_Spain.1252 as it is a spanish version of Windows Server and the MS locales are different from the ones in a Linux environment. And due to that UpperNational and LowerNational have to be specified.

-- SebastianKlus - 04 Jun 2008

Hi. I have been reading all of the I18N documentation related to configuring Twiki in a non-english language. I have all the po files and the right options enabled at the configure interface, but I can`t get my Twiki to work in Spanish. I can enable French and Portuguese, though. Strange. Any help? Thanks in advance

-- JuanLussich - 18 Jun 2008

Really strange, because I have a spanish TWiki working on Windows (Windows Server 2003) and on Linux (Ubuntu 8.04) without problems. What happens, if you switch e.g. from English to Spanish? Or don't you even have the option to switch into Spanish?

-- SebastianKlus - 19 Jun 2008

Juan: This forum is for feedback on the documentation. Please ask support questions in the Support web.

-- PeterThoeny - 20 Jun 2008

You´re right Peter. My mistake here.

-- JuanLussich - 10 Jul 2008

Please use the Support forum if you have questions about TWiki features. This comment section is about the documentation of this topic.
Edit | Attach | Watch | Print version | History: r38 < r37 < r36 < r35 < r34 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r38 - 2012-10-02 - PeterThoeny
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.