NOTE: This is a
SupplementalDocument topic which is
not included with the official TWiki distribution. Please help maintain high quality documentation by fixing any errors or incomplete content. Put questions and suggestions concerning the
documentation of this topic in the
comments section below! Use the
Support web for problems you are having using TWiki.
Installing TWiki with Internationalisation support (I18N)
This document is useful if you need to use languages other than English for the content of your TWiki 4.0 site, or in other words, you must support international characters in TWiki pages and
WikiWords, or would like a translated user interface. This is generally known as Internationalisation (I18N).
There are two types of TWiki I18N:
- Content Internationalisation, which enables the use of international character sets in TWiki pages and WikiWords - this is described in TWiki:Codev.InternationalisationEnhancements (which also provides an overview of TWiki I18N) and was added in releases before TWiki 4.0.
- User Interface Internationalisation, which enables the TWiki user interface to be translated (localised), with some built-in translations to a range of languages - this is described in UserInterfaceInternationalisation and was added in TWiki 4.0.
The details on this page apply to TWiki 4.0 or higher specifically. Setup for earlier versions is similar but involves editing
.cfg files.
Key points to note about which locale and character set to use:
- Do not use UTF-8 as your
{Site}{Charset} or as part of the {Site}{Locale} setting, unless you need Chinese or Japanese character support.
- If you use an alphabetic language (whether with Roman alphabet as in French or Cyrillic alphabet as in Russian), WikiWords using all your language's special characters will work. See CyrillicSupport for examples of Cyrillic usage. Generally, you should use a single-byte character set such as ISO-8859-*, KOI8-R, etc, but not UTF-8.
- If you use a non-alphabetic language, e.g. most East Asian languages including ideogrammatic languages such as Chinese and Japanese, you will need to use UTF-8 or EUC-*. TWiki support for UTF-8 (Unicode) is somewhat limited, but is quite usable - in particularly, only English WikiWords will work. Since most East Asian sites don't require alphabetic languages other than English, this is not a significant limitation. See JapaneseAndChineseSupport for examples of Japanese and Chinese usage, and be sure to use UTF-8 or EUC-*, not GB2312, Shift-JIS or other character encodings!
- If you only have
.utf8 locales, you may need to generate locales (on Ubuntu / Debian, see man locale.gen) or install a locale package (on SUSE, Red Hat, etc) - see the section on adding locales. Don't just use a .utf8 locale if you're a European language user, as it won't work at all well.
Internationalisation Setup
By default, TWiki is configured to support ASCII letters (English language with no accents) in
WikiWords, and ISO-8859-1 (Western European) characters in page contents. If that's OK for you, just ignore this page.
If your TWiki site will be used by non-English speakers, TWiki can be configured for Internationalisation ('I' followed by 18 letters, then 'N', or
I18N). Specifically, TWiki will support suitable accented characters in
WikiWords (as well as languages such as Japanese or Chinese in which
WikiWords do not apply), and will support virtually any character set in the contents of pages. In addition, searching and sorting will work properly with I18N data, i.e. you can include accented characters such as
ü when searching, and
WebIndex will show you TWiki topics ordered using the correct collation sequence for your locale.
To configure internationalisation support:
- Using TWiki 4.0 or higher, run the
configure tool and open the Localisation section
- For user interface internationalisation, enable {UserInterfaceInternationalisation}, and select the languages for which you require a translated user interface
- For content internationalisation, enable {UseLocale} by ticking its checkbox. TWiki will now use the I18N parameters set in the rest of this section.
- Log on to your TWiki server using SSH or Telnet, and type the Unix/Linux command
locale -a to find a suitable 'locale' for your use of TWiki. (If you are on Windows, you won't be able to use locales, see below.) A locale that includes a dot followed by a character set is recommended, e.g. pl_PL.ISO-8859-2 for Poland. Consult your server system administrator if you are not sure which locale to use, and see the point above about generating or installing locales if there is no suitable locale already available.
- In
configure, set the {Site}{Locale} parameter to your chosen locale, e.g. pl_PL.ISO-8859-2 for Poland.
- Set the {Site}{Charset} to the same character set used in the locale, e.g.
ISO-8859-2 for Poland as above. _Note that this will not be needed in a future release, and was not necessary in CairoRelease (TWiki 3) or earlier.)
- Save your settings
- Check your I18N setup by reviewing all parts of the Localisation section of
configure for warnings - this provides some diagnostics for I18N setup, and in particular checks that your chosen locale can be used successfully.
- Always check your web browser is using the right encoding to view your non-English pages, especially when you are editing a page whose name is a non-English WikiWord. For example, the browser should show ISO-8859-2 or the equivalent as its Encoding setting, given the
configure setting of {Site}{Locale} above.
- In InternetExplorer, use View | Encoding
- In Firefox, use View | Character Encodings - Firefox is a good browser for experimenting with I18N since it includes a wide range of international character fonts 'out of the box'.
- Try out your TWiki by creating pages in the Sandbox web that use international characters in WikiWords and checking that searching, WebIndex, Backlinks and other features are working OK.
Adding locales
On some Linux/Unix systems, you may find you only have UTF-8 locales. These don't work well with TWiki (until
TWiki:Codev.UnicodeSupport is developed), so you may need to add new non-Unicode locales. The examples below are for ISO-8859-1 in France - you can modify these as you require.
Please add details for your Linux distribution or Unix system below, as a new section.
Adding locales on Ubuntu and Debian
Since Ubuntu and Debian don't generate all locales at install time, you will need to do.
- Use a suitable editor (e.g.
nano) to edit /etc/locale.gen - add following line at end, for whatever locale you require:
fr_FR.ISO-8859-1 ISO-8859-1
- Type command
locale-gen --keep-existing to generate these new locales while keeping existing locales
- Check the locales were created by typing
locale -a
- Now run TWiki's
configure tool to use this locale and check it can be used OK.
Adding locales on other systems
Add details for new systems here, creating a new section where needed.
Trouble with I18N?

:
You do need to set {Site}{Charset} at present: currently, due to a change in behaviour introduced in TWiki 4.0 and still present in 4.1.2, you
do need to set the
{Site}{Charset} explicitly, despite the documentation here saying this is an error - rather confusing, so just set
{Site}{Charset} to
exactly the same charset in the locale (the part after the '.'). This will be changed in a future release as it makes setup of I18N too complex.
Setting {Site}{Charset} is usually wrong: It is incorrect to set {Site}{Charset} directly in configure, in 99% of all cases - the correct approach is to set {Site}{Locale}, which includes the character encoding on the end (e.g. .iso-8859-1), and also sets the 'locale' needed for other purposes such as WikiWord I18N. The only time you should set the {Site}{Charset} is to override the character encoding when (1) the locale specifies character encoding X and (2) the web browser will only accept a different spelling of X, e.g. iso8859-9 vs iso-8859-9 or Latin-9 (this is a made-up example only, but it can occasionally happen). Currently not true, but will become true again in a future release, see above.
Locale is set but I18N doesn't work: If international characters in
WikiWords do not seem to work despite setting the locale (e.g. you are on Windows), you can still use I18N with some features disabled, no matter what Perl version is installed:
- Keep {UseLocale} enabled in
configure - note that the {Site}{Locale} setting is still used to set the browser character encoding even though you will not be using the Perl locales feature.
- Disable {LocaleRegexes} - this disables some features (specifically, sorting and searching using accented characters), but enables basic WikiWord I18N to work even if your system has locales that do not work.
- Set the {UpperNational} and {LowerNational} parameters to a string containing the valid upper and lower case non-ASCII letters for your locale, e.g.
'äë...'. For Western European languages using Roman alphabet, this means accented characters. For Cyrillic and other non-Roman alphabets, this means the entire alphabet. If you are using ISO-8859-1, here are some settings that may help:
- {UpperNational}:
AAAAÄÅÆÇEÉEEIIIIDÑOOOOÖOUUUÜ
- {LowerNational}:
àáâaäåæçèéêëìíîïdñòóôoöoùúûüß
Windows systems: You
must use the above 'localeRegexes = 0' workaround (whether using Cygwin or ActiveState Perl), since Perl locales are not working on Windows as of TWiki 4.0.
- NOTE: If you have sufficient system resources on your Windows server, you are recommended to investigate using TWiki:Codev.TWikiVMDebianStable - this is very easy to install, and avoids the problems with Perl using Windows locales. It is the only way to get a fully working I18N installation of TWiki on Windows, including features such as proper searching and sorting of WikiWords with accented characters.
Perl 5.005 systems: If international characters in
WikiWords aren't working, and you are on Perl 5.005
with working locales, use the 'localeRegexes = 0' workaround above -
configure should generate the lists of characters for you, in which case just copy them into the relevant
TWiki.cfg settings.
Setting CHARSET is wrong: If you set the
CHARSET parameter in
TWikiPreferences, that is a mistake - please unset it. The TWiki site character set is derived from the
$siteLocale setting in
TWiki.cfg, as mentioned in the setup steps above. There is no need to edit
TWikiPreferences when configuring I18N.
Other troubleshooting ideas: See the comments in
configure for more information and help in setting up I18N.
Support questions about I18N
Some tips to help you get your I18N issues resolved when raising questions in the
Support web:
- Read SupportGuidelines and provide all relevant information. In
configure, be sure to do 'expand all options' and then save the page as an HTML file - then attach this file to the support question page you're creating. It's much easier to read the configure output if it's in HTML format not plain text.
- Provide a simplified example that shows exactly what the problem is - screenshots are a really good idea, with some text or graphics to highlight the error.
- Remember that the person helping you with your support question may not speak the language your site is using, or even be able to read the characters in the case of Chinese, Japanese or most non-European languages - simplified examples with highlighted errors are important for this reason.
- Upload the raw text of the topic (i.e. the topic's
.txt file) for your simplified example, as an attachment - it's important to do this, so that the characters are not translated into ISO-8859-1, the character encoding used by TWiki. (You can ignore this if you're using ISO-8859-1, but it's still useful to have this attachment).
Upgrading from 01 Feb 2003 release?
If you were using
TWiki:Codev.TWikiRelease01Feb2003 support for I18N, with Internet Explorer or Opera, all users should reconfigure their browser so that it sends URLs encoded with UTF-8 (supported since
TWiki:Codev.TWikiRelease01Sep2004). For most browsers, this is the default, though TWiki also works with browsers that don't use UTF-8 encoded URLs.
- Internet Explorer 5.0 or higher: in Tools | Options | Advanced, check 'always send URLs as UTF-8', then close all IE windows and restart IE.
- Opera 6.x or higher: in Preferences | Network | International Web Addresses, check 'encode all addresses with UTF-8'.
NOTE: This support for UTF-8 in URLs does not mean that TWiki fully supports UTF-8 as a site character set. However, it is fine to use UTF-8 if you need to support East Asian languages such as Chinese or Japanese.
--
RichardDonkin - 08 Apr 2006
Comments & Questions about this Supplemental Document Topic
Cool, thank you for writing this guide, very essential for non-English sites!
--
PeterThoeny - 08 Apr 2006
No problem, it's an updating of the old
TWiki03 section.
It would be good if someone could update the reference to docs on
UserInterfaceInternationalisation, not sure where this is documented.
--
RichardDonkin - 10 Apr 2006
I'm looking to construct a site in english and embed the altavista babelfish tool to let it be viewed in asian languages. I tried putting it my sandbox entry earlier - it's a good tool. Seemed to run the translation engine, but Japanese didn't display. Any idea if this will this work with the I18N?
Second, can a page be edited in either english or japanese, or a mix of both? ie once you have I18N in, does it support everything?
Can anyone help? Thanks.
For anyone that's interested:
http://www.altavista.com/help/free/free_searchbox_transl
--
BruceNiven - 30 Jul 2006
Please ask support questions in the
Support web.
--
PeterThoeny - 01 Aug 2006
Bruce, if you're still reading - haven't tried embedding the Altavista translation tool, but the problem is most likely that it outputs Unicode (UTF-8) characters and you aren't using a UTF-8 locale. You would probably need to select a
{Site}{Locale} in
configure that ends in
.utf8.
As for English and Japanese - this is definitely supported, just use EUC-JP or UTF-8 as your site character set.
Putting questions in the
Support web ensures they don't get missed like this one.
--
RichardDonkin - 07 Sep 2006
Updated above to flag that setting
{Site}{Charset} is usually the wrong thing to do.
--
RichardDonkin - 12 Nov 2006
Just a note that I had a situation where setting
{Site}{Charset} was the right thing to do - see
CharsetChangeOnDakarUpgrade.
--
MartinRothbaum - 07 Dec 2006
I nop'ed all I18N on this topic to get rid of the ugly question mark link. This is caused by a spec change in
WikiWord linking of the TWiki 4.1 release.
--
PeterThoeny - 30 Jan 2007
Update to highlight that most people should not use UTF-8, and document the current behaviour of TWiki from 4.0 to 4.1.2 re
{Site}{Charset}, which is in fact required for a working content
I18N? setup.
--
RichardDonkin - 14 Mar 2007
Thanks Richard for clearing the confusion! Next step is to fix the code to work properly again for I18N.
--
PeterThoeny - 14 Mar 2007
Updated to cover the setup for
UserInterfaceInternationalisation.
Peter - the code fix is underway and mostly done, just need some time to finalise it.
One oddity is that we seem to be using
viewfile for some attachment serving (via explicit link in text), but not for links in attachment table. I think avoiding viewfile is best.
--
RichardDonkin - 15 Mar 2007
Locale generation on Ubuntu has changed. I'd recommend the following:
sudo locale-gen de_DE@euro
A list of possible values (instead of de_de@euro) can be looked up in the file /usr/share/i18n/SUPPORTED.
Then I would still recommend to do a
sudo dpkg-reconfigure locales
The result can be checked with
locale -a
--
GerhardHeeke - 16 Jan 2008
I use "de_DE@euro" ( {Site}{Locale} & {Site}{Charset}: "de.DE.ISO-8859-15") that works fine in Browser. But umlauts in E-mails show as "?".
For expample: "Danke f�r ihre..." or "Sie m�ssen nun ihre E-Mail Adresse verifizieren. Sie k�nnen..."
"locale -a" say: C
de_DE
de_DE@euro
de_DE.utf8
en_US.utf8
POSIX
"locale" say: LANG=de_DE@euro
LC_CTYPE="de_DE@euro"
LC_NUMERIC="de_DE@euro"
LC_TIME="de_DE@euro"
LC_COLLATE="de_DE@euro"
LC_MONETARY="de_DE@euro"
LC_MESSAGES="de_DE@euro"
LC_PAPER="de_DE@euro"
LC_NAME="de_DE@euro"
LC_ADDRESS="de_DE@euro"
LC_TELEPHONE="de_DE@euro"
LC_MEASUREMENT="de_DE@euro"
LC_IDENTIFICATION="de_DE@euro"
LC_ALL=
Can anybody help me?
Thanks a lot!
--
ThomasHesse - 28 Jan 2008
{Site}{LocaleRegexes} is an expert setting in
configure, so press the expert button first.
--
ArthurClemens - 16 Feb 2008
after spending some time with working with utf8 in twiki, i created post in my blog at
http://dot-and-thing.blogspot.com/2008/03/twiki-utf8.html.
--
IvanBaktsheev - 13 Mar 2008
Thank you Ivan. I created
TWikibug:Item5437 to track this.
--
PeterThoeny - 13 Mar 2008