NOTE: This is a
SupplementalDocument topic which is
not included with the official TWiki distribution. Please help maintain high quality documentation by fixing any errors or incomplete content. Put questions and suggestions concerning the
documentation of this topic in the
comments section below! Use the
Support web for problems you are having using TWiki.
Installing TWiki with Internationalisation support (I18N)
Introduction
This document is useful if you need to use languages other than English for the content of your TWiki site, or in other words, you must support international characters in TWiki pages and
WikiWords, or would like a translated user interface. This is generally known as Internationalisation (or I18N, from 'I' followed by 18 letters, then 'N').
There are two types of TWiki I18N:
- Content Internationalisation, which enables the use of international character sets in TWiki pages and WikiWords - this is described in TWiki:Codev.InternationalisationEnhancements (which also provides an overview of TWiki I18N).
- User Interface Internationalisation, which enables the TWiki user interface to be translated (localised), with some built-in translations to a range of languages - this is described in UserInterfaceInternationalisation and was added in TWiki 4.0.
The details on this page apply to TWiki 4.0 or higher specifically. Setup for earlier versions is similar, but involves editing
.cfg
files.
Key points to note about which locale and character set (encoding) to use:
- Once you choose an encoding and start to generate pages, it is very difficult to change to a different encoding. So it's very important you make the right choice when you first install TWiki.
- If you use an alphabetic language (whether with Roman alphabet as in French or Cyrillic alphabet as in Russian), WikiWords using all your language's special characters will work. See CyrillicSupport for examples of Cyrillic usage. Generally, you will be better off using a single-byte character set such as ISO-8859-*, KOI8-R, etc, rather than UTF-8. This is because the support for multibyte character sets in TWiki is not complete, and you may run into issues using it.
- If you use a non-alphabetic language, e.g. most East Asian languages including ideogrammatic languages such as Chinese and Japanese, you will need to use UTF-8 or EUC-*. TWiki support for UTF-8 is somewhat limited, but is quite usable - in particular, only English WikiWords will work. Since most East Asian sites don't require alphabetic languages other than English, this is not a significant limitation. See JapaneseAndChineseSupport for examples of Japanese and Chinese usage, and be sure to use UTF-8 or EUC-*, not GB2312, Shift-JIS or other encodings!
- If you want to use WYSIWYG editing with international character sets, you must update your versions of WysiwygPlugin and TinyMCEPlugin from the TWiki.org Plugins site. Older versions of these plugins do not work with encodings other than
iso-8859-1
.
For a detailed technical discussion of what character encodings are, see
UnderstandingEncodings.
Internationalisation Setup
By default, TWiki is configured to support ASCII letters (English language with no accents) in
WikiWords, and ISO-8859-1 (Western European) characters in page contents. If that's OK for you, just ignore this page.
If your TWiki site will be used by non-English speakers, TWiki can support suitable accented characters in
WikiWords (as well as languages such as Japanese or Chinese in which
WikiWords do not apply), and will support virtually any character set in the contents of pages. In addition, searching and sorting will work properly with I18N data, i.e. you can include accented characters such as
ü
when searching, and
WebIndex will show you TWiki topics ordered using the correct collation sequence for your locale.
To configure internationalisation support:
- Run the
configure
tool and open the Localisation section
- For user interface internationalisation, enable {UserInterfaceInternationalisation}, and select the languages for which you require a translated user interface
- For content internationalisation, enable {UseLocale} by ticking its checkbox. TWiki will now use the I18N parameters set in the rest of this section.
- Log on to your TWiki server using SSH or Telnet, and type the Unix/Linux command
locale -a
to find a suitable 'locale' for your use of TWiki. (If you are on Windows, you won't be able to use locales, see below.) A locale that includes a dot followed by a character set is recommended, e.g. pl_PL.ISO-8859-2
for Poland. Consult your server system administrator if you are not sure which locale to use, and see the section on installing locales if there is no suitable locale available.
- In
configure
, set the {Site}{Locale} parameter to your chosen locale, e.g. pl_PL.ISO-8859-2
for Poland.
- Set the {Site}{Charset} to the same character set used in the locale, e.g.
ISO-8859-2
for Poland as above.
- If you don't set the {Site}{CharSet} TWiki 4.x and later will default to generating pages using iso-8859-1
- The only time you should set the
{Site}{Charset}
to something different from the locale is to override the character encoding when (1) the locale specifies character encoding X and (2) the web browser will only accept a different spelling of X, e.g. iso8859-9
vs iso-8859-9
or Latin-9
(this is a made-up example only, but it can occasionally happen)
- Save your settings, and go back to
configure
- Check your I18N setup by reviewing all parts of the Localisation section of
configure
for warnings - this provides some diagnostics for I18N setup, and in particular checks that your chosen locale can be used successfully.
- Always check your web browser is using the right encoding to view your non-English pages, especially when you are editing a page whose name is a non-English WikiWord. For example, the browser should show ISO-8859-2 or the equivalent as its Encoding setting, given the
configure
setting of {Site}{Locale} above.
- In InternetExplorer 6 & 7, use Tools - Internet Options - General - Languages
- In Firefox 2, use View - Character Encoding. Firefox is a good browser for experimenting with I18N since it includes a wide range of international character fonts 'out of the box'.
- Try out your TWiki by creating pages in the Sandbox web that use international characters in WikiWords and checking that searching, WebIndex, Backlinks and other features are working OK.
Adding locales
On some Linux/Unix systems, you may want to add new locales. The examples below are for ISO-8859-1 in France - you can modify these as you require.
Please add details for your Linux distribution or Unix system below, as a new section.
Adding locales on Ubuntu and Debian
Since Ubuntu and Debian don't generate all locales at install time, you will need to do this.
Debian
- Use a suitable editor (e.g.
nano
) to edit /etc/locale.gen
- add following line at end, for whatever locale you require:
fr_FR.ISO-8859-1 ISO-8859-1
- Type command
locale-gen --keep-existing
to generate these new locales while keeping existing locales
- Check the locales were created by typing
locale -a
- Now run TWiki's
configure
tool to use this locale and check it can be used OK.
Ubuntu
sudo locale-gen de_DE@euro
A list of possible values (instead of
de_de@euro) can be looked up in the file /usr/share/i18n/SUPPORTED.
Then
sudo dpkg-reconfigure locales
The result can be checked with
locale -a
Adding locales on other systems
Add details for new systems here, creating a new section where needed.
Trouble with I18N?
Locale is set but I18N doesn't work: If international characters in
WikiWords do not seem to work despite setting the locale (e.g. you are on Windows), you can still use I18N with some features disabled, no matter what Perl version is installed:
- Keep {Site}{CharSet} set in
configure
- note that the {Site}{CharSet}
setting is still used to set the browser character encoding even though you will not be using the Perl locales feature.
- Disable {LocaleRegexes} - this disables some features (specifically, sorting and searching using accented characters), but enables basic WikiWord I18N to work even if your system has locales that do not work.
- Set the {UpperNational} and {LowerNational} parameters to a string containing the valid upper and lower case non-ASCII letters for your locale, e.g.
'äë...'
. For Western European languages using Roman alphabet, this means accented characters. For Cyrillic and other non-Roman alphabets, this means the entire alphabet. If you are using ISO-8859-1, here are some settings that may help:
- {UpperNational}:
AAAAÄÅÆÇEÉEEIIIIDÑOOOOÖOUUUÜ
- {LowerNational}:
àáâaäåæçèéêëìíîïdñòóôoöoùúûüß
Windows systems: You
must use the above 'localeRegexes = 0' workaround (whether using Cygwin or ActiveState Perl), since Perl locales are not working on Windows as of TWiki 4.0.
- NOTE: If you are using ActivePerl, make sure that you installed the modules Encode::compat and Unicode::UTF8simple. They might not be installed by default, so you have to add them through the repository manager. Not installing these module might cause strange characters when using e.g. the CommentPlugin.
- NOTE: If you have sufficient system resources on your Windows server, you are recommended to investigate using TWiki:Codev.TWikiVMDebianStable - this is very easy to install, and avoids the problems with Perl using Windows locales. It is the only way to get a fully working I18N installation of TWiki on Windows, including features such as proper searching and sorting of WikiWords with accented characters.
Perl 5.005 systems: If international characters in
WikiWords aren't working, and you are on Perl 5.005
with working locales, use the 'localeRegexes = 0' workaround above -
configure
should generate the lists of characters for you, in which case just copy them into the relevant
TWiki.cfg
settings.
Setting CHARSET is wrong: If you set the
CHARSET
parameter in
TWikiPreferences, that is a mistake - please unset it. The TWiki site character set is taken from the
{Site}{CharSet}
setting in
TWiki.cfg
, as mentioned in the setup steps above. There is no need to edit
TWikiPreferences when configuring I18N.
Other troubleshooting ideas: See the comments in
configure
for more information and help in setting up I18N.
Support questions about I18N
Some tips to help you get your I18N issues resolved when raising questions in the
Support web:
- Read SupportGuidelines and provide all relevant information. In
configure
, be sure to do 'expand all options' and then save the page as an HTML file - then attach this file to the support question page you're creating. It's much easier to read the configure output if it's in HTML format not plain text.
- Provide a simplified example that shows exactly what the problem is - screenshots are a really good idea, with some text or graphics to highlight the error.
- Remember that the person helping you with your support question may not speak the language your site is using, or even be able to read the characters in the case of Chinese, Japanese or most non-European languages - simplified examples with highlighted errors are important for this reason.
- Upload the raw text of the topic (i.e. the topic's
.txt
file) for your simplified example, as an attachment - it's important to do this, so that the characters are not translated into ISO-8859-1, the character encoding used by TWiki. (You can ignore this if you're using ISO-8859-1, but it's still useful to have this attachment).
Upgrading from 01 Feb 2003 release?
If you were using
TWiki:Codev.TWikiRelease01Feb2003 support for I18N, with Internet Explorer or Opera, all users should reconfigure their browser so that it sends URLs encoded with UTF-8 (supported since
TWiki:Codev.TWikiRelease01Sep2004). For most browsers, this is the default, though TWiki also works with browsers that don't use UTF-8 encoded URLs.
- Internet Explorer 5.0 or higher: in Tools | Options | Advanced, check 'always send URLs as UTF-8', then close all IE windows and restart IE.
- Opera 6.x or higher: in Preferences | Network | International Web Addresses, check 'encode all addresses with UTF-8'.
NOTE: This support for UTF-8 in URLs does not mean that TWiki fully supports UTF-8 as a site character set. However, it is fine to use UTF-8 if you need to support East Asian languages such as Chinese or Japanese.
Contributors: RichardDonkin,
GerhardHeeke,
CrawfordCurrie
Comments & Questions about this Supplemental Document Topic
Older comments refactored into the text or deleted by CrawfordCurrie - 31 May 2008. See revision 25 of this page for the refactored content.
Just to give you some additional feedback:
Following this manual and the additional comments, I configured I18N support on Ubuntu 8.04 for a Spanish environment. It works as far as I can see,
BUT by the time a new page with national characters like
á
is saved, the corresponding file in tha
data
path shows a txt-filename containing invalid characters. I still have not figured out very clearly this utf, iso and conversion thing, but I suppose this has to do with the fact that Ubuntu is using utf8 as system locale. But at least Wiki-linking works.
Little side-effect: I am trying to migrate from a Windows-TWiki to Ubuntu and at this moment this filename issue results to be a real problem I have not solved yet. The pages show up fine, but all links coming from Windows are broken. I filed a support-page
(see MigrationFromWindowsToUbuntu), so if anything has experience on this issue, his/her help is really appreciated. Thanks.
--
SebastianKlus - 29 May 2008
Does anyone know if the statement above regarding perl locales on Windows still holds? Has anyone ever tried to
I18N a Windows server?
--
CrawfordCurrie - 31 May 2008
Yes, it does. I have TWiki installed on a Windows Server and you still have to disable
LocaleRegexes
. In my case,
SiteLocale
has to be
Spanish_Spain.1252
as it is a spanish version of Windows Server and the MS locales are different from the ones in a Linux environment. And due to that
UpperNational
and
LowerNational
have to be specified.
--
SebastianKlus - 04 Jun 2008
Hi. I have been reading all of the
I18N documentation related to configuring Twiki in a non-english language. I have all the po files and the right options enabled at the configure interface, but I can`t get my Twiki to work in Spanish. I can enable French and Portuguese, though. Strange. Any help? Thanks in advance
--
JuanLussich - 18 Jun 2008
Really strange, because I have a spanish TWiki working on Windows (Windows Server 2003) and on Linux (Ubuntu 8.04) without problems. What happens, if you switch e.g. from English to Spanish? Or don't you even have the option to switch into Spanish?
--
SebastianKlus - 19 Jun 2008
Juan: This forum is for feedback on the
documentation. Please ask support questions in the
Support web.
--
PeterThoeny - 20 Jun 2008
You´re right Peter. My mistake here.
--
JuanLussich - 10 Jul 2008