PageModeRssEncodeBug < Support

Question

TWiki version: 01 Feb 2003
Perl version: 5.005_03
Web server & version: Apache/1.3.26 (Unix)
Server OS: 4.6-STABLE FreeBSD i386
Web browser & version: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3) Gecko/20030414
Client OS: 4.6-STABLE FreeBSD i386

The procedure in TWiki.pm at line number 1158, which encode characters into XML entities if( $pageMode eq 'rss' ), incorrectly assume ISO8859-1 charset, breaking the international compatibility, i.e. koi8-r charset. The procedure should convert characters from specified charset into unicode, than encode as entities (if needed).

When browser see å it always display lowercase `a` with ring, NOT russian koi8-r char with value 229, even if charset of document is specified as koi8-r.

-- SergeySolyanik - 14 Apr 2003

Answer

You probably have this set up correctly, but just to check... Do you have KOI8-R as your %CHARSET% setting, i.e. the last part of $siteLocale? Also, is %CHARSET% used in your version of CVSget:templates/view.rss.tmpl? If so, displaying å should correctly display the corresponding KOI8-R character, not the ISO-8859-1 å.

The code simply takes a character with the high bit set and encodes it as an HTML entity, without assuming anything about the charset except that it is an 8-bit charset - if you have CHARSET set correctly this should work. See NationalCharactersEncodedInSearchResults which was reason for this behaviour - done for someone using KOI8-R and tested using this charset with Mozilla and InternetExplorer. Some Cyrillic test pages are at CyrillicSupport, though the RSS feed on my site is broken at the moment.

-- RichardDonkin - 15 Apr 2003

Yes, I have correct %CHARSET% in view.rss.tmpl, and $siteLocale is ru_RU.KOI8-R. But the assumption about displaying KOI8-R character entity is incorrect! No standart compliant browser should apply encoding to entities, and that is what mozilla do. å is å in any encoding, that what entities for. And NationalCharactersEncodedInSearchResults example doesn't use HTML entities, it's use 8-bit characters.

I think that using Unicode, UTF-8 will be good enough for RSS feeds. Pages should be mapped from local charset into UTF-8. Or, if we use other charset, TWiki should not encode characters.

-- SergeySolyanik - 16 Apr 2003

The simplest fix is just to remove all XML entity encoding for search results, for all non-Unicode character sets (UTF-8 not being supported yet). See NationalCharactersEncodedInSearchResults which I have re-opened as a result, and thanks for logging this. I've attached an example of an RSS feed that includes one URL to a TWiki topic with KOI8-R name - this should work in any browser since the encoding on this page is KOI8-R, demonstrating that avoiding use of XML entities will fix this bug. The summary doesn't work in the attachment since it uses XML entities, illustrating the issue.

-- RichardDonkin - 16 Apr 2003

WebForm
SupportStatus	AnsweredQuestions

Attachments

Topic attachments
I	Attachment	History	Action	Size	Date	Who	Comment
xml	WebRss-koi8-r-cutdown.xml	r1	manage	3.0 K	2003-04-16 - 15:16	UnknownUser	KOI8-R encoded RSS feed, cut down.

Topic revision: r6 - 2003-04-16 - RichardDonkin

Account
- Log In
- Register User

Edit
Attach

Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.