Bug: Some Chinese characters using GB2312 break Wiki links
See
Support.SomeChineseCharactersBreakWikiLinks for details - use of GB2312 character encoding breaks TWiki links. Fix is to use EUC-CN (not well supported by browsers but will work in TWiki today) or UTF-8 (better supported in browsers but TWiki support not finished - however may be fine for Chinese sites).
Test case
- Set $siteLocale in TWiki.cfg to
zh_CN.gb2312
- Enter GB2312 characters where second byte overlaps with TWiki characters - test case in GB2312 not yet available on SomeChineseCharactersBreakWikiLinks
Environment
| TWiki version: |
TWikiRelease01Sep2004 |
| TWiki plugins: |
any |
| Server OS: |
any |
| Web server: |
any |
| Perl version: |
any |
| Client OS: |
any |
| Web Browser: |
any |
--
RichardDonkin - 28 Oct 2004
Follow up
Note that Windows code page 936 was once used to mean GB2312-80 encoded with EUC-CN (which would work with TWiki), but is now used for the GBK character set, encoded with GBK, which does not work. One side effect of this is that on Windows, frequently 'GB2312' really means GBK, a superset of GB2312.
Source: CJKV book, Ken Lunde, p. 202, Oct 2002 printing.
Fix record
Users of TWiki should use EUC-CN or UTF-8 as mentioned above - see
JapaneseAndChineseSupport and
ProposedUTF8SupportForI18N.
Code fixed in
SVN MAIN to exclude GBK, GB2312 and GB18030 (extension of GBK) as well as the Korean encodings Johab and UHC.
The
CJKV book
has a useful set of Perl regexes on pp.1021-1022 that indicate which encodings have a second byte in the range 0x00 - 0x7F and are therefore not usable with TWiki.
--
RichardDonkin - 28 Oct 2004