Bug: TWiki on Mac OS X server with I18N generates odd looking file names
tested with Mac OS X generate odd-looking file names, due to HFS+ and UFS filesystem UnicodeNormalisation
issues. TWiki does work OK, but the filenames are not very easy to use for administrators using the command line.
It's also possible that attachments using some I18N characters, uploaded from Mac clients and downloaded by Windows/Unix clients, could cause problems - not tested.
for original bug report from InternationalisationEnhancements
- turned out to be mainly Mozilla UTF8 URL encoding issues.)
See comment by StefanLindmark
. Browser has not been configured in any way, but there are no configuration notes for Mozilla. I'm attaching TWiki.cfg and testenv output.
| TWiki version:
| TWiki plugins:
| Server OS:
|| Mac OS X 10.2.1
| Web server:
|| Apache 2.0.40
| Perl version:
| Client OS:
|| Mac OS X 10.2.1
| Web Browser:
|| Mozilla 1.2
- 03 Dec 2002
(From emails) MacOS
is creating quite weird looking filenames, but TWiki is working fine, so I'm setting this to BugResolved
. If people using TWiki I18N
find the filenames annoying on MacOS
X, please open a new bug. StefanLindmark
is now testing on Perl 5.6.1 on Linux, which works fine.
- 10 Dec 2002
I've done some more testing to shed some light on how file names are treated in OS X. What I did was:
- Created the topic AnAufHinterInNebenÜberUnterVorZwischen in TWiki running on Linux stored on reiserfs filesystem
- Created the topic AnAufHinterInNebenÜberUnterVorZwischen in TWiki running on OS X stored on HFS+ filesystem
- Created the folder AnAufHinterInNebenÜberUnterVorZwischen in Finder running on OS X stored on HFS+ filesystem
Then I ran
ls > filename
on each of those files using
in the same environment as they were created in. The resulting output files from this have been attached to this topic. Hopefully these files can be of use for the people that put their skills into further development of i18n.
- 11 Dec 2002
One implication that needs to be investigated is portability. If I run TWiki with i18n enhancements on a server running OS X, what happens if I want to move the site to a box with a different OS/filesystem? Would it be possible to transfer the files straight over to the new environment or would there be a need to recode the filenames?
- 14 Dec 2002
Only one way to find out, so I tried it by doing this:
tar cvf an.tar AnAuf*
scp an.tar mysite.net:upload
tar xvf an.tar
The result can be seen below:
So I guess this is
something to worry about if you want to be able to move files around between different systems as your server platform may shift over time.
- 17 Dec 2002
Interesting - however, I think the best longer term solution is to find out why MacOS
X is UTF8-encoding filenames and see if it can be configured to avoid this, or to show the names to the user in ISO8859-1 (or perhaps to just support UTF8 filenames and topic names). Transforming UTF8 filenames into ISO8859-1 when moving server platforms would be another option.
- 19 Dec 2002
Apple technote #1150
documents the Unicode filename encoding of the HFS+ filesystem. With _trace enabled on the RCS
file shows that RCS
commands are using 8-bit single character ISO-8859 encoding of filenames (i.e. "å" encoded as E5 hex). But files are still written with Unicode filenames. One idea could be to use RcsLite
and see if
and friends in their Apple-distributed form are the cause of this.
Transforming filenames on server platform moves makes data portable, but I'm sceptic about having TWiki running on OS X generating a lot of files on the backend that are difficult to browse, backup, restore, etc. I haven't even started thinking of how useful the available backup tools will be when filenames turn up with mixed charsets and script styles (e.g. starting with western chars, and then reversing script direction to right-to-left and using non-western characters). I guess it would be more difficult to handle that situation than having stray "?" replacing 8-bit chars, still leading to recognizable filenames. So my ambition is still to try to find out how to move TWiki away from this Unicode stuff on OS X and behave like other common Unix systems like Linux and Solaris.
- 21 Dec 2002
I've been doing a lot more research into Unicode (see InternationalisationUTF8
) and it's a bit clearer what was happening here from reading the HFS+ doc's Unicode section
- basically, HFS+ appears to prefer to work in Unicode 2.1, storing characters internally in 16-bit values, and also normalises all filename characters into a decomposed form (i.e. "å" is encoded as "a" followed by the accent as a separate Unicode character). This can be seen in the Finder generated attachment below, which presumably is correct.
The TWiki-generated attachment looks like UTF-8 encoding of the precomposed character (i.e. "å" as a single Unicode codepoint, encoded as two bytes in UTF-8).
HFS+ actually uses an Apple-modified version of Unicode's Normalisation Form D (NFD, i.e. decomposed), whereas Unix/Linux and W3C
standards use Normalisation Form C (NFC, i.e. precomposed). MacOS
X 10.2 seems to have recognised this issue and at least provides an API to normalise into NFC, but in any case TWiki would need to normalise filenames read out from the filesystem into NFC - without this, it appears that the conversion back to ISO-8859-1 doesn't work. This is really a MacOS
X implementation issue but can be worked around. Possible solutions include:
- TWiki code to do the normalisation to NFC - should be configurable as something like
TWiki.cfg - enabled on HFS+ filesystems but not on the UFS (Unix style) filesystem. There are some Apple developer docs that describe this in more detail. Main option, enables non-NFD-capable browsers (e.g. Konqueror 3.1.1) to work with MacOS X and I18N.
- Try using a UTF-8 or other locale setting when administering TWiki files so that the conversion from Unicode NFD format to ISO-8859-1 is avoided or works properly. RCS may not work well with Unicode NFD format, though this should be largely transparent to RCS. This will also be necessary, since first option doesn't change use of NFD for filenames.
- Research/test using Perl 5.8.x in case this has addressed this issue. Not covered by Perl 5.8, may be covered by Perl 6.
Some useful links on Apple's NFD-based normalisation in HFS+:
On testing the Finder-generated file below, using IE5.5 in UTF-8 encoding mode, it was displayed correctly - so IE at least is able to display UTF-8 NFD filenames.
The TWiki-generated file has been corrupted somehow, since the capital ü was transformed into 0xDBA2, which is an Asian character.
- 11 Sep 2003
I now have a plan for how to solve this issue as part of ProposedUTF8SupportForI18N
If you do need to convert a whole set of filenames from one character encoding to another, have a look at Bjoern Jacke's
) - suggested by the author in email.
- 14 Oct 2003
It seems that UFS filesystems have the same NFD behaviour
on Darwin (the FreeBSD
based Unix underlying MacOS
), so it's not just HFS+.
There's a related issue mentioned in this W3C list thread
- if a MacOS
X user attaches a file with a Unicode NFD filename to a TWiki page, by default TWiki would store the filename in UTF-8 without changing the normalisation. This would then mean that users on some other platforms (e.g. Konqueror on Linux) would probaby have the NFD filename rendered incorrectly even if the server is not MacOS
Also, when TWiki is in UTF-8 mode, MacOS
X's builtin conversion of Unicode NFD to ISO-8859-1 etc does not apply - the unconverted Unicode NFD characters from the filesystem will remain in NFD mode, resulting in a similar problem.
So it seems that normalisation will be important if there are any MacOS
clients or servers involved in a TWiki deployment, and hence for all public TWiki sites.
Are there any Mac users out there who could test this?
- 14 Feb 2004
Back in 2004, Mozilla suite and Thunderbird fixed the problem of MacOS
exposing Unicode NFD normalisation of filenames to the outside world (caused a problem with MacOS
clients attaching files) - solution was to convert data from MacOS
clients from NFD into NFC (which is what rest of world uses), see MozillaBug:227547
- 01 Oct 2006
Interesting thread about I18n Filenames and CGI upload
- may cause some problems on MacOS
X at some point, due to use of NFD normalisation by HFS+. See also Bugs:Item3652
re other attachment issues.
- 18 Mar 2007