Tags:
create new tag
view all tags

Question

  • TWiki version: 01 Feb 2003
  • Perl version: 5.005_03
  • Web server & version: Apache/1.3.26 (Unix)
  • Server OS: 4.6-STABLE FreeBSD i386
  • Web browser & version: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3) Gecko/20030414
  • Client OS: 4.6-STABLE FreeBSD i386

The procedure in TWiki.pm at line number 1158, which encode characters into XML entities if( $pageMode eq 'rss' ), incorrectly assume ISO8859-1 charset, breaking the international compatibility, i.e. koi8-r charset. The procedure should convert characters from specified charset into unicode, than encode as entities (if needed).

When browser see å it always display lowercase `a` with ring, NOT russian koi8-r char with value 229, even if charset of document is specified as koi8-r.

-- SergeySolyanik - 14 Apr 2003

Answer

You probably have this set up correctly, but just to check... Do you have KOI8-R as your %CHARSET% setting, i.e. the last part of $siteLocale? Also, is %CHARSET% used in your version of CVSget:templates/view.rss.tmpl? If so, displaying å should correctly display the corresponding KOI8-R character, not the ISO-8859-1 å.

The code simply takes a character with the high bit set and encodes it as an HTML entity, without assuming anything about the charset except that it is an 8-bit charset - if you have CHARSET set correctly this should work. See NationalCharactersEncodedInSearchResults which was reason for this behaviour - done for someone using KOI8-R and tested using this charset with Mozilla and InternetExplorer. Some Cyrillic test pages are at CyrillicSupport, though the RSS feed on my site is broken at the moment.

-- RichardDonkin - 15 Apr 2003

Yes, I have correct %CHARSET% in view.rss.tmpl, and $siteLocale is ru_RU.KOI8-R. But the assumption about displaying KOI8-R character entity is incorrect! No standart compliant browser should apply encoding to entities, and that is what mozilla do. å is å in any encoding, that what entities for. And NationalCharactersEncodedInSearchResults example doesn't use HTML entities, it's use 8-bit characters.

I think that using Unicode, UTF-8 will be good enough for RSS feeds. Pages should be mapped from local charset into UTF-8. Or, if we use other charset, TWiki should not encode characters.

-- SergeySolyanik - 16 Apr 2003

The simplest fix is just to remove all XML entity encoding for search results, for all non-Unicode character sets (UTF-8 not being supported yet). See NationalCharactersEncodedInSearchResults which I have re-opened as a result, and thanks for logging this. I've attached an example of an RSS feed that includes one URL to a TWiki topic with KOI8-R name - this should work in any browser since the encoding on this page is KOI8-R, demonstrating that avoiding use of XML entities will fix this bug. The summary doesn't work in the attachment since it uses XML entities, illustrating the issue.

-- RichardDonkin - 16 Apr 2003

Topic attachments
I Attachment History Action Size Date Who Comment
XMLxml WebRss-koi8-r-cutdown.xml r1 manage 3.0 K 2003-04-16 - 15:16 UnknownUser KOI8-R encoded RSS feed, cut down.
Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r6 - 2003-04-16 - RichardDonkin
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.