Tags:
create new tag
, view all tags

Bug: Meta Data Handler can't Process CR-LF Line Endings

A meta data line ending in "\r\n" is not recognized as a valid meta data line, it shows up literaly as META:DATA{....} in the topic text. Plugins that manipulate meta data might introduce "\r\n" line endings instead of the default "\n" line ending.

Test case

Edit a topic file on the shell level and save with CR-LF line endings; or try the undocumented repRev command (which is broken in the latest Beta release)

Environment

TWiki version: TWikiRelease01Feb2003
TWiki plugins: DefaultPlugin, EmptyPlugin, InterwikiPlugin
Server OS: N/A
Web server: N/A
Perl version: N/A
Client OS: Windows
Web Browser: Netscape

-- PeterThoeny - 29 Jan 2004

Fix record

Accept "\r\n" and "\n" line endings when parsing the meta data. Change is in TWikiAlphaRelease and at TWiki.org.

*** ./b/lib/TWiki/Meta.pm   2004-01-17 00:23:58.000000000 -0800
--- ./a/lib/TWiki/Meta.pm   2004-01-28 22:23:35.000000000 -0800
***************
*** 298,304 ****

      my $newText = "";

!     foreach ( split( /\n/, $text ) ) {
          if( /^%META:([^{]+){(.*)}%$/ ) {   # greedy match for ending "}%"
              my $type = $1;
              my $args = $2;
--- 298,304 ----

      my $newText = "";

!     foreach ( split( /\r?\n/, $text ) ) {
          if( /^%META:([^{]+){(.*)}%$/ ) {   # greedy match for ending "}%"
              my $type = $1;
              my $args = $2;

-- PeterThoeny - 29 Jan 2004

Follow up

Just realized that there is a change in storing files:

  • Before this fix:
    • Meta data lines terminated by "\n" (LF) only
    • Topic text lines terminated by "\r\n" (CR-LF)
  • After this fix:
    • Meta data lines terminated by "\n" (LF) only
    • Topic text lines terminated by "\n" (LF) only

Does that impose a problem?

-- PeterThoeny - 29 Jan 2004

Yes -- you now have no way of correctly determining the end of the topic text, since having %META lines in topic text is valid, having them at the end of the topic is also valid. (And indeed functions as expected - there's been numerous discussion in Codev where this has been used as a demonstration.)

This is the same problem that SMTP has with embedded full stops and mbox format mail systems have with ^From lines. This will break content - (including pages in Codev). An example page that in the past would still be edittable now is here:

Note that the debug version now contains what the raw=1 version above shows on a standard TWiki:

Also note that the text has been mangled - the text as entered looked like:

The topic text is now returned as:

And the raw view is:

I suspect even this simple view might confuse TWiki - that's not the intent, the intent is to show the problems this causes. (The last example TWiki has problems with and inserts the rogue <p /> )

The reason this happens is because you've constrained the by your earlier bugfix (without any tests that it breaks other content):

          if( /^%META:([^{]+){(.*)}%$/ ) {   # greedy match for ending "}%"

Used to be: (and is in the majority of people's installations)

          if( /^%META:([^{]+){(.*)}%/ ) {

To summarise, the change you've made prevents metadata from being allowed at the end of topic text, thus breaking content on TWiki.org and elsewhere. Reverting to

          if( /^%META:([^{]+){(.*)}%/ ) {
Would re-open the old bug, but resolve this one. (And indeed if this patch was applied by anyone other than yourself, this is probably what would happen)

Incidentally the original bug was a problem caused by lack of escaping metadata content and storage of metadata does not go through a marshall/unmarshalling phase.

Since your "fix" actually breaks more content than it fixes, I've change the TopicClassification from BugResolved to BugReport .

-- MS - 29 Jan 2004

The new spec with identical line endings for meta data and topic text is orthogonal and more clear. My only concern is to make sure that this does not break existing tool functionality like RCS on different platforms, RcsLite and diff (which can be fixed). Because of this I leave this bug report open for a while. Can anyone help test this?

Entering meta data into topic text is not documented and should not be used. The common practice to escape TWiki variables is %<nop>NAME%, that should also be the case for meta data.

-- PeterThoeny - 03 Feb 2004

See Sandbox.WillProbablyResultInCorruptedText for content that is bust, and can no longer be editted by end users - thereby breaking the wiki nature of those pages (the reason I've put it in the sandbox incidentally).

The content was entered looking like this - note that content was entered inside <verbatim> tags - another method that people expect to just work . The example is also not specious - there are many places on twiki.org where people discuss meta data handling changes.

    This is text preceding the inline meta definition surrounded by verbatim. We are discussing here what these declarations do, and considering
    extending this to do more things, which is why we have an extra field
    <verbatim>
    ---
    %META:FORM{name="NonExistantForm"}%
    %META:FIELD{name="TopicClassification" title="TopicClassification" value="DocumentationPage"}%
    %META:FORMLOOKANDFEEL{Topic="NonExistantFormLookAndFeel"}%
    ---
    </verbatim>
    
    This is text preceding the inline meta definition surrounded by verbatim. We are discussing here what these declarations do, and considering
    extending this to do more things, which is why we have an extra field
    
This is a trivial way to a) break a wiki with this feature b) make it difficult (not impossible) for the normal audience to fix without access to WebPreferences.

This change makes TWiki vulnerable to defacement in such a way that SoftSecurity is also compromised. (Since the community can't fix this easily) It might be said that this isn't a problem, but the corruption of topic text on it's way to store, the misdisplay of content, the fact it's a security issue as well strike me as major problems.

Furthermore, to make line endings orthoganal there is no point in storing topic text with trailing ^M. After all, if this security flaw that breaks soft security is ignored then you lose all the benefits you used to have of ^M's still being there (differentiation of metadata from topic text). At that point removing the ^M's when storing topic text would make logical sense.

However in that situation a user may start editting a topic which ends with ^M on disk, edits it, and the ^M's are replaced with standard line endings. This results in a potential 2 words edit looking like the entire page contents has changed.

Consider what I'm saying for once, not who is saying it.
Corruption of user's text is about the worst possible thing any application can do, and it's a huge problem for a wiki.
( "Oh that wiki software corrupted my data, now we're happily using..." )

-- MS - 05 Feb 2004

Could everyone also consider how they are saying things? Consider this revision for example.

-- SamHasler - 05 Feb 2004

I do consider how I'm saying things. In my experience PeterThoeny does NOT (appear to) listen when people raise points that are counter to his own viewpoint - he dismisses these independent opinions. He has dismissed the point originally of no longer being able to tell topic text from meta data. He has dismissed it breaks content (on twiki.org of all places, as well as other sites). He has dismissed he's changing semantics, without providing a mechanism to change topic storage format. (Which he's dismissed as going against the TWikiMission)

That is why I had phrased things more forcefully. You have diluted that. I was also specifically speaking to PeterThoeny when I said "If you're serious...". You have changed the meaning of what I've said.

If you intend to change meaning of what I've said, delete my signature from the contribution. Better yet, refactor the discussion, since then you won't (accidentally) misrepresent what other people have said.

The point stands - this small lexical* change is bad and breaks content, (due to also changing siginifcantly semantics) introduces a security hole that is hard to fix with soft security.

  • * It's specifically about how the written text is stored/dealt with, and specifically a regex change which is a lexical issue in parsing parlance.

In case it helps, I'll write a brief doc on the structure of ondisk topic format to illustrate the change that this change causes and why I'm so adamant that this introduces a nasty bug (aside from all the other reasons noted above).

-- MS - 06 Feb 2004

First my apologies, as I must be the guilty party that introduced this problem (different line ending for meta). Second, lets please focus on all understanding the issue here, this will not be achieved by making alegations against Peter.

I agree with Peter that having all line endings the same is important. Clearly this must no break with existing content. If use of meta examples in verbatim tags fails, the we clearly need to address this. I'm not clear from the above, that there are any other bugs that will be caused by cleaning up line endings. Have I understood correctly?

I should add that the intention was to destinguish meta data purely on the basic of %META:...% being present at the start of a line (verbatim should clearly be taken note of). Unless I'm missing something, I can't see why you would want another mechanism. Although this might mean that TWiki needs to throw out META data added during a normal edit.

-- JohnTalintyre - 06 Feb 2004

Could "real" meta be defined as being on the first & last lines in the file, and if any are added at the start or end during edit then an extra line end could be automatically inserted between the "real" and the "edit" meta tags.

-- SamHasler - 06 Feb 2004

No, you can't define it that way. (Prove me wrong with code. Please!) The reason is simple - people can - and do add meta data in topics manually. Often at the beginning or end of topics. Furthermore, how would you tell which on disk format you were dealing with?

  • * I can find more examples if required - it's been suggested in many places.

A good wrapper mechanism provides a clear mechanism to define the contents of what's been wrapped. Whilst not a perfect wrapper (I don't believe a perfect one exists) lines ending ^M versus not ending ^M is a good wrapper by that particular definition. If John did it by accident, it explains alot, but it doesn't change the fact it's a good wrapper. I would like to see a better wrapper. But if that is done, the format version number at the top of the topic must be bumped (ie from format="1.0" ) Calls from plugins for raw topic text should ideally be presented a representation consistent with that format. Whilst this might be called "unsupported", the fact it's had to be used surely takes precedence?

John,

> I'm not clear from the above, that there are any other bugs that will be caused by cleaning up line endings. Have I understood correctly?

Yes. See also: structure of ondisk topic format.

Bottom line - this change dramatically changes the interpretation of on-disk storage introducing further bugs. It's also a knockon from Peter's earlier change on 17 Jan (or so).

On a secondary note - this point is bad: (IMO)

    the intention was to destinguish meta data purely on the basic of %META:...% being present at the start of a line (verbatim should clearly be taken note of). Unless I'm missing something, I can't see why you would want another mechanism. Although this might mean that TWiki needs to throw out META data added during a normal edit.

Reasons users want to be able to add META in the topic text are:

  • They don't want to define a webform, but do want to use metadata. (There's no support for arbitrary fields)
  • Users want to define a webform, but can't because the local site has denied them that ability. (Goes against the WikiWay)
  • Because they can, and hence do.
  • Because they want to talk about meta tags.
  • Because they might say something like:
    ... yeah, the storage on disk starts off with a
    %META:BLA{... type declaration, and continues on until the end of the line. Customisation using %PARAM{"param"}%
    is fairly common, replacing the older %URLPARAM{"param"}% in many locations.
    
And last but by no means least:
  • Because the topic text container should preserve anything they choose to say, and not hijack it.

Unless this metadata shows up in the topic text edit box, they effectively end up with a write once mechanism.

On another point, whether the verbatim was there or not is irrelevant - I was providing a fairly reasonable example of how discussion on TWiki.org can happen (I can find further real examples if needed), with the storage mechanism corrupting the contents.

A storage mechanism that seperates that separates wrapper information (meta data) from the thing being wrapper (topic text) should do so in such a way that it can cope with any form of content it's wrapping. (Marshalling & unmarshalling as it needs to) If it starts looking at the content - ie the contents of <verbatim> then the wrapper mechanism is bust.

Second note, the bug report states this:

  • "Edit a topic file on the shell level and save with CR-LF line endings;"
    • Surely this is unsupported? Anyone could break TWiki by editting the raw topic text, and the obligation is on the onus of the user, not the code to cope with this.
    • Furthermore trying this with a Store.pm with unmodified semantics from BeijingRelease shows that this bug does not exhibit itself when doing precisely this. Example:

This means the TWiki version noted up the top (BeijingRelease) is wrong - it's actually a problem with recent TWiki alphas. BeijingRelease copes with this "issue" fine. (Surely BugRejected against BeijingRelease in that case, but BugReport for current alpha?)

-- MS - 07 Feb 2004

Question moved to StructureOfOndiskTopicFormat -- ThomasWeigert - 07 Feb 2004

Hi Thomas,

Peter asked whether his change broke stuff. I said it did. I provided examples. I then provided more examples. I've provided further examples now of people talking about ways of doing things which his change breaks. I've also posted structure of ondisk topic format.

I think I've made my point. Whether it will be accepted is another matter. I'm ceasing contribution on this issue at this point - clearly breaking the ondisk storage format is not a problem for TWiki.org, or else I would not have recieved the response I did. There are only so many hours in the day. (Also there's a saying, if at first you don't succeed, try, try again and then give up, you're probably making yourself look a fool)

My "decision" will have no bearing at all on what TWiki.org does or does not do. If you take a CVS alpha update at this time however, your content will be busted*, since PeterThoeny has already taken that decision (this is checked in). If you want that decision reversed, please read structure of ondisk topic format and decide for yourself whether this is a big enough issue for you to disagree with Peter on.

    * If any of your users are using the facility you mentioned on MetaDataRendering regarding embedding META:FIELD values in a topic. (It's simple enough to check of course - write a perl one liner to grep your topic text for lines

The "bug" described at the beginning of this topic does not exist in Athens or Beijing release - it only exists in recent TWiki alphas.

Examples of content broken on TWiki.org:

Possibly unrelated oddness:

Who ever wanted to edit a METASEARCH inside a topic anyway...

-- MS - 07 Feb 2004

Michael, can you explain what you think is wrong in these examples? I looked at each of the indicated pages and could not see what you were referring to. Please explain... Thanks.

-- ThomasWeigert - 08 Feb 2004

Edit | Attach | Watch | Print version | History: r20 < r19 < r18 < r17 < r16 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r20 - 2004-09-24 - MartinCleaver
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2015 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.