Tags:
create new tag
, view all tags

Question

I'm running TWiki under OS/390 and all the links to edit, search and upload are rewritten as escaped EBCDIC-Chars so search at TWiki.Main.WebHome will be written as

action=/twiki/bin/search/%d4%81%89%95/SearchResult .
The Webserver can't translate the this escaped Resources on fly from IBM-1047 to ISO-8859-1. 
'Main' = 'D4818995'x (IBM-1047) = '4D61696E'x (ISO-8859-1).

see attached file

Fixed - see patch file below.

Environment

TWiki version: TWikiRelease01Feb2003
TWiki plugins: DefaultPlugin, EmptyPlugin, InterwikiPlugin
Server OS: OS/390
Web server: IBM HTTP V5.3 for OS/390
Perl version: 5.6.1
Client OS: Win2000
Web Browser: IE-Explorer

-- OliverEichhorn - 27 Aug 2003

Answer

I can't give you an answer but a place to look further, TWikiOnMainframe.

-- PeterThoeny - 28 Aug 2003

I was running 01 SEP 2001 TWiki for evaluation without any problems. The new version has an different behaviour and is not usable on an Mainframe any more..... TWiki without editing isn't an Wiki .... Another question : Why is this kind of escaping characters only used for editing, searching and uploading but for all other links this escaping isn't used ....(see file)

-- OliverEichhorn - 28 Aug 2003

Please attach the output of testenv as an HTML file, as mentioned in the SupportGuidelines. I suspect this is due to the way that InternationalisationEnhancements require POST URLs to be encoded when using Mozilla-type browsers. Can you test inline searches (e.g. the SiteMap) - if these work but form-driven searches do not, it is probably POST related.

-- RichardDonkin - 28 Aug 2003

It's an Problem of I18N and our Mainframe. On the Mainfarme all files and cgi-output will be translated "on the fly" from the webserver according the settings in httpd.conf. In my case it will be converted from EBCDIC IBM-1047 to ASCII ISO8859-1. The function INTURLENCODE sees the locale de_DE.IBM-1047 and encode the strings as EBCDIC but these encoded chars doesn't match to ASCII ISO8859-1. To bypass this problem i will remove all INTURLENCODE from the templates.

-- OliverEichhorn - 28 Aug 2003

Good to see you are using I18N smile ... Try changing the following line of code in TWiki.pm:

    $_[0] =~ s/%INTURLENCODE{(.*?)}%/&handleUrlEncode($1,1)/ge;
so that it reads (not tested):
    $_[0] =~ s/%INTURLENCODE{(.*?)}%/$1/ge;

This turns INTURLENCODE into a no-operation, avoiding edits to all the templates. I suspect a better solution is needed for this area that encompasses TWikiOnMainframe as well as Mozilla users, all with I18N. Let me know if this works.

-- RichardDonkin - 28 Aug 2003

Thanks Richard for your support of an unusual platform (EBCDIC). I've changed the line, but that doesn't solve my problem. I peeked a little around an found an line

# Make Edit URL unique for every edit - fix for RefreshEditPage
$_[0] =~ s|%EDITURL%|"$scriptUrlPath/edit$scriptSuffix/%URLENCODE{\"%WEB%/%TOPIC%\"}%\?t=" . time()|ge;
and changed this line to
# Make Edit URL unique for every edit - fix for RefreshEditPage
$_[0] =~ s|%EDITURL%|"$scriptUrlPath/edit$scriptSuffix/%WEB%/%TOPIC%?t=" . time()|ge;
and now the links for EDIT are ok. (see file ..) I think these kind of problems are unique for the combination of platform and webserver. IMHO the best solution will be to have an extra parm which is set to the ENCODINGCP for INTURLENCODE and URLENCODE, so that these functions will use the "hardcoded" codepage.

I will test the rest of TWiki and report further problems in this topic.

-- OliverEichhorn - 29 Aug 2003

I used to be a Unix sysadmin on an Amdahl mainframe, so I have at least some affinity with mainframes smile

I would need to really think about exactly what parameters need changing, and more testing is needed of course, but I think I see what the problem is now. Not sure why the URLENCODE is there in the Edit link, to be honest - in fact it probably shouldn't be there, I need to look at CVS:lib/TWiki.pm history to see.

-- RichardDonkin - 29 Aug 2003

My changes have created another problem. The URL of the form tag got quotation marks which break the logic of WEBNAME-processing (see attached files from 01 Sep 2003). I will install the older version in paralell to this one and switch these versions when the problems are resolved. BTW Richard is your company running Unix System Services under OS/390 or nativ Unix.

-- OliverEichhorn - 01 Sep 2003

I'm not sure what is happening here - certainly double quotes have been stripped from the Edit file (attached 10:19) in some places, but I'm not sure what is doing this. TWiki wouldn't do this by itself, so it is some sort of interaction with your server and environment. I would like to get this sorted out if you have time to pursue this on a test installation of TWiki. It would help if you could use another web browser (e.g. OperaBrowser or ideally CygWin's wget or lynx) to save the file - IE5.5 seems to be modifying it as it is saved, which could be introducing confusing changes.

The Preview file posted just now doesn't seem to have these IE changes and is a bit clearer - not sure why it is saying <form action="/twiki/bin/search/"Main"/SearchResult"> - could this be an artefact of how you've edited the templates, perhaps leaving in some double quotes? It doesn't occur elsewhere so I don't think it's a bug in core code. Please post the template for Preview here, but also consider using my suggested edit earlier with completely unmodified templates.

I don't use mainframes at all now - I used to use native Unix (Amdahl UTS) on a mainframe, but that was 15 years ago.

-- RichardDonkin - 01 Sep 2003

I didn't change the templates, the only changes applied to TWiki ars the lines mentioned above in TWiki.pm . I'm a litle in trouble installing PC-Software, normaly not allowed ... wink but lynx under OS/390 have some problems with Twiki URLs (Lynx Version 2.8.3rel.1 (23 Apr 2000) Built on os390 Jan 7 2002 16:41:46 ) See attached files prefixed Opera for the results.

-- OliverEichhorn - 01 Sep 2003

I see what's happening now - try these 2 lines in TWiki.pm, they should avoid the double quotes being inserted in the URL (which only happens with the hacked null INTURLENCODE function):

    $_[0] =~ s/%INTURLENCODE{"(.*?)"}%/$1/ge;
    $_[0] =~ s/%INTURLENCODE{(.*?)}%/$1/ge;

This is just a temporary hack to emulate the double quote stripping normally done by extractNameValuePair, called from handleUrlEncode - my original fix was too simplistic.

Also, from the message near top of this page, it seems that the browser understands ISO-8859-1 (ASCII based), not the IBM-1047 (EBCDIC based) character set:

action=/twiki/bin/search/%d4%81%89%95/SearchResult .
The Webserver can't translate the this escaped Resources on fly from IBM-1047 to ISO-8859-1. 
'Main' = 'D4818995'x (IBM-1047) = '4D61696E'x (ISO-8859-1).

This seems to be the webserver trying to translate this URL within the outbound HTML page as it is served. The URL encoding fixes should now avoid this, but generally this is somewhat challenging - the charset is determined automatically from the locale, and used by Perl to determine what is upper case alphabetic and so on, so it's important that it's IBM-1047 internally to TWiki; however, the browser needs to think it is getting ISO-8859-1 (since that's what the mainframe web server is translating IBM-1047 to). So, once you have the URLs working, you might want to also do something like this in CVS:lib/TWiki.pm (if TWiki is still not working OK) - just comment out the following line in the setupLocale function:

   $siteCharset = $1 if defined $1;

This means that $siteCharset remains set to the default value (ISO-8859-1). TWiki will then use this charset in the HTTP headers and the HTML page body - however, this may also confuse your web server or browser, so experimentation may be needed! Generally, the HTTP headers override the HTML page, but the browser and server may behave differently here.

This is a fairly difficult environment in which to run TWiki I18N, to say the least! There are probably quite a few ASCII dependencies in TWiki, e.g. the URL encoding code assumes that anything with 8th bit set must be URL encoded, which in turn prevents the web server from translating charsets properly - hence the problems you've encountered. Some interesting links are Perl and EBCDIC (article), the perlebcdic POD page, and Perl's README.os390 page.

Lynx does work, at least where I've tested it, as mentioned in BrowserIssues, but I'm sure the URLs are not quite sorted out yet, and Lynx may handle the charset settings differently to IE.

Re the Edit URL encoding above - this CVS entry shows that I introduced this code to fix Mozilla I18N problems (see MozillaURLEncodingWithI18N), but fortunately by making it use INTURLENCODE this can be turned off in the same place as the POST URLs - now in TWikiAlphaRelease, see CVS:lib/TWiki.pm for this minor change (from 3 Sept, due to CVSweb lag).

-- RichardDonkin - 02 Sep 2003

Thanks Richard these changes did it. smile

I have to clarify the behaviour of the webserver after playing with my "illegal" Opera, which is better than our M$-Browser.

The webserver translates only static pages on the fly from IBM-1047 to ISO8859-1.

CGI-Output doesn't get translated.

In my own CGI-REXX-Script i'm doing the translation of the URL-Escaping to ISO8859-1 or in DB2-Connections between IBM273 and ISO8895-1 because of special german chars ä, ö, ü etc.

BTW TWiki set's the correct Encoding to IBM1047 as seen now in Opera ....

Lynx on mainframe is with these fixes also ok.

See ebcdic.patch file for changes required to get TWiki working. -- RD

The webpages above are interesting from my mainframe view and they only touch some of the problems.

So i do the encoding for Emails as Quoted Printable with a homegrown REXX according to RFC1521 sect. 5.1 , maybe you can do something similar for URL-Encoding.

Please feel free to contact me for this issue.

-- OliverEichhorn - 03 Sep 2003

Glad it worked! Are you now able to use German accented characters in WikiWords? If so, please comment on InternationalisationDiscuss if possible - any feedback on I18N support is very useful.

Judging from the testenv output, and having skimmed the manual for IBM HTTP Server for OS/390 (Appendix C's AddEncoding and AddCharset are relevant, as is the section on Codepages), I think it's likely that the server is translating all HTTP interactions, including those to and from TWiki, from IBM-1047 to ISO-8859-1. The IBM-1047 charset is used for the filesystem and by the TWiki Perl code, and set in the FSCP environment variable, while the ISO-8859-1 charset is used on the network and by the browser, and set in the NETCP environment variable. This is the default setting according to the docs.

The web server may well be re-writing the HTTP headers so that the browser interprets the HTML as ISO-8859-1 (headers normally override the HTML page's charset). (I'm assuming you didn't make the change above re $siteCharset in TWiki.pm.) In any case, Opera and IE don't support EBCDIC/IBM-1047 so some translation must be happening somewhere for these pages to be rendered into ASCII/ISO-8859-1 that I can view here as attachments.

It would be good to see the REXX scripts that you have done (not sure what the first one does), since there may be a small change to TWiki that would address this.

I'd also like to do a core code patch that makes INTURLENCODE configurable, at least - this should be fairly easy. For now, MozillaBrowser probably won't work with TWikiOnMainframe when using I18N, so it would also be useful to address that at some point. It may be possible to auto-detect this web server since it has somewhat unusual environment variables etc, and just use that to set $siteCharset.

One useful spin-off of all this is that the IBM HTTP Server manual documents an algorithm that solves the difficult problem of how to recognise a URL that includes encoded escape sequences as UTF-8 or some other character set - see my posting on InternationalisationUTF8.

-- RichardDonkin - 03 Sep 2003

Usage of german accented characters as WikiWords is working fine, but still without URL-Escaping (see File).

At TWikiOnMainframe i've written an little How-Install-On-Mainframe.

I will send the REXX-Execs to Richard as an email.

Thanx to all of the contributors of TWiki.

-- OliverEichhorn - 04 Sep 2003

A cleaner fix for this issue has been tested by Oliver and is now in CVS - see TWikiAlphaRelease, or use the ebcdic.patch file below for TWikiRelease01Feb2003. This fix should now work for any platform on which Perl runs in EBCDIC, including OS/400 (where one Perl port uses EBCDIC) and VM Open Edition. The code now tests directly for EBCDIC rather than a specific web server, so no configuration or template editing is required.

People will also need a previous applied patch, so the complete set is as listed below under Final patches.

I'll also keep EBCDIC in mind for EncodeURLsWithUTF8, which will enable Mozilla to work with TWikiOnMainframe.

-- RichardDonkin - 10 Sep 2003

This topic should be split into two. Support part here; implementation part to the Codev web. Then, set the link in CairoRelease accordingly. Lets keep TWiki.org organized smile

-- PeterThoeny - 29 Sep 2003


Category: TWikiPatches

Final patches - these are against TWikiAlphaRelease, should work for TWikiRelease01Feb2003:

Topic attachments
I Attachment History Action Size Date Who Comment
HTMLhtml I18N_SandBox_WebHome..html r1 manage 10.0 K 2003-09-04 - 13:44 OliverEichhorn Sandbox with german accented chars
HTMLhtml Opera_TWiki_Main_WebHomePreview.html r1 manage 10.3 K 2003-09-01 - 12:49 OliverEichhorn Preview (Opera)
HTMLhtml Opera_TWiki_Sandbox_WebHome..html r1 manage 4.0 K 2003-09-01 - 12:48 OliverEichhorn Edit Sandbox (Opera)
HTMLhtm TWiki_Main_WebHome.htm r1 manage 12.0 K 2003-08-27 - 10:33 OliverEichhorn Twiki_Main_WebHome.htm
HTMLhtm TWiki_Main_WebHome_fixed.htm r1 manage 12.0 K 2003-08-29 - 10:30 OliverEichhorn Output after changes to TWiki.pm
HTMLhtm TWiki_Main_WebHomepreview.htm r1 manage 10.2 K 2003-09-01 - 10:20 OliverEichhorn Preview Sandbox got Main Topic
HTMLhtm TWiki_Sandbox_WebHomeedit.htm r1 manage 4.0 K 2003-09-01 - 10:19 OliverEichhorn Edit Sandbox
Unknown file formatpatch ebcdic.patch r1 manage 2.6 K 2003-09-10 - 18:06 RichardDonkin EBCDIC support patch (as in CVS)
HTMLhtm testenv.htm r1 manage 13.5 K 2003-08-28 - 14:36 OliverEichhorn output of testenv
Edit | Attach | Watch | Print version | History: r21 < r20 < r19 < r18 < r17 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r21 - 2003-09-29 - PeterThoeny
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.