We relaunched the TWiki.org project with an expanded TWiki charter, and we invite you to participate! The TWiki.org Code of Conduct agreement took effect on 27 Oct 2008. We ask existing twiki.org users to opt-in. You must opt-in to participate in the Blog, Codev, Plugins and TWiki webs. -- PeterThoeny - 27 Oct 2008
You are here: TWiki> Support Web>PageModeRssEncodeBug (16 Apr 2003, RichardDonkin)
Tags:
create new tag
, view all tags

Question

  • TWiki version: 01 Feb 2003
  • Perl version: 5.005_03
  • Web server & version: Apache/1.3.26 (Unix)
  • Server OS: 4.6-STABLE FreeBSD i386
  • Web browser & version: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3) Gecko/20030414
  • Client OS: 4.6-STABLE FreeBSD i386

The procedure in TWiki.pm at line number 1158, which encode characters into XML entities if( $pageMode eq 'rss' ), incorrectly assume ISO8859-1 charset, breaking the international compatibility, i.e. koi8-r charset. The procedure should convert characters from specified charset into unicode, than encode as entities (if needed).

When browser see å it always display lowercase `a` with ring, NOT russian koi8-r char with value 229, even if charset of document is specified as koi8-r.

-- SergeySolyanik - 14 Apr 2003

Answer

You probably have this set up correctly, but just to check... Do you have KOI8-R as your %CHARSET% setting, i.e. the last part of $siteLocale? Also, is %CHARSET% used in your version of CVSget:templates/view.rss.tmpl? If so, displaying å should correctly display the corresponding KOI8-R character, not the ISO-8859-1 å.

The code simply takes a character with the high bit set and encodes it as an HTML entity, without assuming anything about the charset except that it is an 8-bit charset - if you have CHARSET set correctly this should work. See NationalCharactersEncodedInSearchResults which was reason for this behaviour - done for someone using KOI8-R and tested using this charset with Mozilla and InternetExplorer. Some Cyrillic test pages are at CyrillicSupport, though the RSS feed on my site is broken at the moment.

-- RichardDonkin - 15 Apr 2003

Yes, I have correct %CHARSET% in view.rss.tmpl, and $siteLocale is ru_RU.KOI8-R. But the assumption about displaying KOI8-R character entity is incorrect! No standart compliant browser should apply encoding to entities, and that is what mozilla do. å is å in any encoding, that what entities for. And NationalCharactersEncodedInSearchResults example doesn't use HTML entities, it's use 8-bit characters.

I think that using Unicode, UTF-8 will be good enough for RSS feeds. Pages should be mapped from local charset into UTF-8. Or, if we use other charset, TWiki should not encode characters.

-- SergeySolyanik - 16 Apr 2003

The simplest fix is just to remove all XML entity encoding for search results, for all non-Unicode character sets (UTF-8 not being supported yet). See NationalCharactersEncodedInSearchResults which I have re-opened as a result, and thanks for logging this. I've attached an example of an RSS feed that includes one URL to a TWiki topic with KOI8-R name - this should work in any browser since the encoding on this page is KOI8-R, demonstrating that avoiding use of XML entities will fix this bug. The summary doesn't work in the attachment since it uses XML entities, illustrating the issue.

-- RichardDonkin - 16 Apr 2003

Topic attachments
I Attachment Action Size Date Who Comment
xmlxml WebRss-koi8-r-cutdown.xml manage 3.0 K 16 Apr 2003 - 15:16 RichardDonkin KOI8-R encoded RSS feed, cut down.
Topic revision: r6 - 16 Apr 2003 - 15:18:00 - RichardDonkin
 
TWIKI.NET
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback