Tags:
create new tag
, view all tags
InterfaceThread - FeaturesThread

Topic closed. See TWikiMetaData for current definition of meta data file format. Start new MetaDataDev topic for further discussion (and search for references in exisiting discussions).



This started in NavigationByTopicContext. It would be nice to have a generic way to store meta-data of a topic. Meta-data is any data that cannot directly be edited by the user in free form.

Currently we have this type of meta-data:

Possibly we have this type of additional meta-data:

  • Author, date, version number of last revision (this data is in RCS but it would improve performance to have it also here)
  • Original topic name for renamed topic (recursive?)
  • Parent of a topic

File attachments and category table are currently stored at the end of the topic text file, separated by <!--TWikiAttachment--> and <!--TWikiCat-->, respectively. Both data used to be an XMLish HTML table, which is OK to parse, but could be simplified. John has simplified the attachment table as described in AttachmentsUnderRevisionControl, each entry is now a simple variable like %FILEATTACHMENT{"Sample.txt" version="1.3" ... }%

We could go along this line for storing generic meta-data. Need to find a generic solution that is easy to parse and manipulate. Brainstorming:

  • Combine current <!--TWikiAttachment--> and <!--TWikiCat--> into one meta-data part that is stored at the end of the topic.
  • Separate meta-data with <!--TWikiMetaData-->
  • Each meta-data entry is a %VARIABLE%

Example raw text format:

Normal text user can edit is here...
<!--TWikiMetaData-->
%TOPICAUTHOR{"Main.PeterThoeny"}%
%TOPICVERSION{"3"}%
%TOPICDATE{"976762663"}%
%TOPICPARENT{"WebHome"}%
%FILEATTACHMENT{"Sample.txt" version="3" ... }%
%FILEATTACHMENT{"Smile.gif" version="1" ... }%
%CATEGORYITEM{"TopicClassification" value="PublicFAQ"}%
%CATEGORYITEM{"OperatingSystem" value="OsWin"}%
<!--TWikiMetaData-->

Another idea is to use XML instead. Example:

Normal text user can edit is here...
<TWikiMetaData>
  <TopicAuthor>Main.PeterThoeny</TopicAuthor>
  <TopicVersion>3</TopicVersion>
  <TopicDate>976762663</TopicVersion>
  <TopicParent>WebHome<TopicParent>
  <FileAttachments>
    <Attachment>
      <Name>Sample.txt</Name>
      <Version>3</Version>
      ...snip...
      </Attachment>
    <Attachment>
      <Name>Smile.gif</Name>
      <Version>1</Version>
      ...snip...
      </Attachment>
    </FileAttachments>
  <TWikiCategory>
    <Item>
      <Name>TopicClassification</Name>
      <Value>PublicFAQ</Value>
      </Item>
    <Item>
      <Name>OperatingSystem</Name>
      <Value>OsWin</Value>
      </Item>
    </TWikiCategory>
</TWikiMetaData>

Using the %VARIABLE% format instead of XML is easier to parse since the code is alredy in TWiki. Thinking long term it is probably better to use XML.

-- PeterThoeny - 06 Apr 2001

One solution is to say store the meta data in the 'comment' field of each version of the topic. Although this doesn't really allow the meta-data to be version track itself.

Since PackageTWikiStore will have a generic meta-data interface it doesn't really matter how the meta-data is stored.

-- NicholasLee - 07 Apr 2001

I think separating the meta-data out is a very good idea. Both the above format would also make it easy to add new information. We could also store here topic movement information. So if topic A is moved to B, the meta data would say where it came from. Additional minimal meta data could be left behind at location A that points to B.

I feel that XML is a better long term solution as it will make TWiki more understandable (by new people). However, I do like the easy (and fast) parsing of the existing format. But, XML is most useful when sending data outside a system, so could store as %VARIABLE%, but present as XML for other systems e.g. search engines.

Of course one can use XML attributes, so:

%FILEATTACHMENT{"Sample.txt" version="3" ... }% -> <FILEATTACHMENT version="3" ...>Sample.txt</FILEATTACHEMNT>
or possibly -> <FILEATTACHMENT name="Sample.txt" version="3" />

-- JohnTalintyre - 07 Apr 2001

We have consensus, so lets do it. Shall we do it for the TWikiReleaseSpring2001, what do you think? To me it makes sense because we have a change in file format anyway with the AttachmentsUnderRevisionControl. The change is not so invasive either.

Do do items: (idea needs to be refined)

  • Use %VARIABLE% format instead of XML for internal storage. (At a later point when we offer external access we can present it in XML format)
  • Meta-data handling in TWiki::Store:
    • TWiki::Store::readTopic and TWiki::Store::saveTopic are low level and return the raw text including meta-data (no change in spec, except possibly add web name to parameter)
    • Functions on topic text and meta-data:
      • Split raw text into user text and meta-data. Meta-data is an array of variables. One array item is i.e. %TOPICPARENT{"WebHome"}%.
      • Join user text and meta-data array into raw text for storage.
      • Get and set specific variable
  • Change TWikiCategoryTable format to meta-data format in the same way as John has done it for the FileAttachment format (do lazy conversion by reading both formats but writing only in new meta-data format. Idea: In fact, the function that splits the raw text into user text and meta-data array can take care of converting the legacy format, so we have a clean data format that is handled internally.

-- PeterThoeny - 07 Apr 2001

Sounds nice, but please check comments in TWikiObjects, as it might justify, add to, or otherwise affect this change.

If metadata is going to be set as %VARIABLES% it would be nice to have a common tag to identify all of them i.e. %META:TOPICPARENT{"Some_web"}%. or even %META{ TopicParent, "Some_web"}%.

And on the implementation, for efficiency sake, I would suggest that a single global? hash be generated by a PackageTWikiTopic? call, that contains all the meta-data for the file.

-- EdgarBrown - 07 Apr 2001

Peter, sounds good to me and I think sensible for TWikiReleaseSpring2001 if we stick (as you suggest) to keeping in the datafile - it will be good to get the older meta data out the way. We'll have to be careful to deal with searching for category information (maybe search can translate from old format to new) Might want to add more explicit method to search for doing category searching. There should clearly be a way of distingusihing the meta-information and it should be easy to do something cleaner than the current HTML comment entries (which now appear in many Perl modules), but I'm not sure if a special name for the variables is neccesary, as they should not be widely available, but it might make things a bit clearer.

I think I can add movement information nicely in this way. But, it will mean that some files only have meta information in and so even if the .txt data file exists, it doesn't mean the topic exists. So, for instance, will need to make sure that upload does not let you upload to a topic that used to exist.

-- JohnTalintyre - 08 Apr 2001

More thoughts on format (based on example above):

%META:FILEATTACHMENT%
%META:CATEGORYITEM%

Notes:

  • Removed <!--TWikiMetaData--> as it seemed a bit messy - not sure if this was a good idea
  • All of topic as one item, just seemed neater than multi entries.
    • [ PeterThoeny ] Seems to be OK. We could consider to have separate entries because search on individual items might be easier, i.e. if you want to search and sort by date.
  • Is parent in TOPIC the parent Web? Previous example used WebHome, but I think of that as default topic for a WEB rather than a parent.
    • [ PeterThoeny ] No, to me, webs are name spaces, and there is not much linking between webs (except for one's signature). Each collaboration team should have it's own web, so a web is a bigger entity. A topic parent is the topic where the current topic was created from (clicking on ? mark). See NavigationByTopicContext. A topic parent is needed to improve navigation within a web.
      • [ JohnTalintyre ] I see now, good idea, this is extra information that is lost at present. What if created from go box - leave blank?
  • META before items for easy filtering (make sure user doesn't supply in saved text)
  • META data always together, one line per item.
    • [ PeterThoeny ] Makes sense. The read function should be forgiving though, simply separate all META and non META lines.
  • All meta variables of same type must be together
    • Know that attachment info has finished when first line without META:FILEATTACHMENT is seen.
    • [ PeterThoeny ] I would reverse that for easy rendering. E.g. first the META:FILEATTACHMENT without parameters (will be rendered as table header), then all items. The same for META:CATEGORYITEM.
      • [ JohnTalintyre ] Start is easy to detect because nothing yet written for the tag. However, table footer (e.g. show hidden attachments), requires detecting end. However, not difficult to do with end marker. Also, would think new attachment should go at end, rather than start. Maybe it's cleaner all round not to have these extra emtpy entries, also order of items on output page does not have to match input file.
  • No Main in names (e.g. author), allows name of this Web to be changed e.g. to People, always prepend this information.
    • [ PeterThoeny ] It should be the REMOTE_USERNAME, the same user name RCS is storing. It is not necessarily the same user name as in the Main web (i.e. pthoeny vs. PeterThoeny)
      • [Main.JohnTalintyre ] I assumed conversion to WikiName (perhaps worth storing both), which in any case is required to do Main prefix.
  • Could also have META:TOPICMOVED{to=...}
  • META:TOPICMOVED could be put into META:TOPIC, but I thought sufficient different to put separately
  • %META:FILEATTACHMENT% with no args makes it easy to add new attachments - a bit messy, thoughts?

Default for topic would be:

%META:TOPICMOVED%
%META:FILEATTACHMENT%
%META:CATEGORYITEM%

-- JohnTalintyre - 11 Apr 2001

See also: ImportingContexts

-- MartinCleaver - 12 Apr 2001

Some more refinements to above example:

%META:FILEATTACHMENT%
%META:CATEGORYITEM%
%META:END%

Changes:

  • Use RCS variables in META:TOPIC - this is particularly useful for the revision number.  
    • [ PeterThoeny ] Makes code very easy, but implies that RCS is used. If we go with this format and we have a different store in the future we should keep the same format.
      • [ JohnTalintyre ] I agree. Unfortunately, I think that getting it to work well will otherwise be very difficult. I was worried that it would be easy to get the revision number wrong. Alternative would be to store topic and then alter it to have information extracted from RCS. I'm not sure of best course here.
  • META:END add, as easy was of adding new meta variables.

Some code fragments to support above:


# ============================
# e.g. 
# $args = "\"a.txt\" author=\"JohnTalintyre\" ...";
# addMetaData( $topicText, "FILEATTACHMENT", $args, "multiple" );
# If allowMultiple is false then first existing place marker will be replaced
# If metaType missing then added at end - FIXME might need to be more complex
sub addMetaData
{
    my( $text, $metaDataType, $metaDataArgs, $allowMultiple ) = @_;
    addDefaultMetaData( $text );
    
    my $fullMetaData = "%META:$metaDataType\{$metaDataArgs\}%";
    my $emptyReplacement = "";
    if( $allowMultiple ) {
        $emptyReplacement = "\n%META:$metaDataType%";
    }
    
    # Try replacing empty entry
    #TWiki::writeDebug( "-----------> %META:$metaDataType%" );
    #TWiki::writeDebug( "************ $text" );
    if( ! ( $text =~ s/\Q%META:$metaDataType%\E/$fullMetaData$emptyReplacement/m ) ) { # Note o flag didn't work here
        #TWiki::writeDebug( "Store::addMetaData try non-empty" );
        # Try replacing non-empty entry
        if( ! ($text =~ s/\%META:$metaDataType\{[^\}]*\}%/$fullMetaData/m ) ) {
            #TWiki::writeDebug( "Store::addMetaData add to end" );
            # Add to end    
            $text =~ s/(%META:END%)/$fullMetaData\n$1/m;  
        }
    }
    TWiki::writeDebug( "=========== $text" );
    
    return $text;
}

# =======================
# Add meta data if missing
sub addDefaultMetaData
{
    if( $_[0] !~ /^%META:/mo ) {
       TWiki::writeDebug( "Store: no meta data - adding" );
       $_[0] .= "\n";
       $_[0] .= "%META:TOPIC%\n";
       $_[0] .= "%META:TOPICMOVED%\n";
       $_[0] .= "%META:FILEATTACHMENT%\n";
       $_[0] .= "%META:CATEGORYITEM%\n";
       $_[0] .= "%META:END%\n";
    }
}

sub addTopicMetaData
{
    my( $web, $topic, $text ) = @_;
    
    my $time = time();
    
    my $args = "name=\"$topic\" parent=\"$web\" version=\"\$Revision\$\" date=\"\$Date\$\" author=\"\$Author\$\"";
    $text = addMetaData( $text, "TOPIC", $args );
    
    return $text;
}

Notes:

  • Try and do all updates via small number of routines, reads can implemented directly for speed.
  • Add metadata to be changed to take in identifying part of variable (e.g. attachment filename) - to replace =$allowMultiple.

-- JohnTalintyre - 12 Apr 2001

Is it too late for me to suggest an alternative implementation?? ;^)

John, your implementation suggests working directly with the text, however I suggest a higher level of abstraction.

Conceptually this is meta-data, that is data that is not directly intermigled with the text, so the separation should be achieved at the read/write routine level (this would also ease conversion from old twiki style, as the data only has to be interpreted on topic read, perhaps under control of a global flag).

So what I propose is to change the interfaces to readVersion and saveTopic from wikistore.pm. That is at the end of readVersion (pseudo-Perl, as I haven't actually tested any of this):

   my $tmp2 = $tmp;
   while ($tmp2 =~ s/^\%META:([^{}\%]+)\{(.*)\}\%//mo){
      if ($meta{$1}) { 
         $meta{$1} = [ @{ $meta{$1} } , $2 ];  #generate a hash of arrays
      } else {
         $meta{$1} = [ $2 ];
      }
   }
   $tmp =~ s/^\%META:.*?\%//mgo;
   return ($tmp,%meta);

Of course, this is only if the data exists, otherwise it would have to create it, probably using the information from rcs, current category implementation, etc.

And somewhere in saveTopic:

   foreach $tag (keys %meta){
      foreach (@{ $meta{$tag} } ){
         $text .= "\n\%META:$tag\{$_\}\%";
      }
   }

That way, after a read, any procedure, just needs to reference $meta{TOPIC}[0] to read the topic string, or $meta{CATEGORYITEM}[i] for each category item, etc. Not to mention that it really eases the path if at some point a different implementation for meta data becomes desirable.

Note that in this implementation it is not necesary to have the extra empty tags, or even the %META:END%, however for this last case, I would probably prefer it to be there.

-- EdgarBrown - 12 Apr 2001

I agree that meta data is best kept separate from the text - but stored with it (makes revision control much easier). Edgar's suggestion revolves around all meta data being recreated for each save, which certainly has appeal. However, I can't help thinking that to do this properly really requires the OO route being worked on for a subsequent version of Store.pm. I think deciding factor should be the method that leads to the cleanest code - more thought and work required.

A point that is not yet addressed is how the meta tags are rendered to output. They are very similar, but not quite the same as existing TWiki variables. It seems reasonable to keep the existing order - attachment, category, revision info, with specific order for attachments and categories.

Regardless of how meta data is processed, I think there may be some merit in the category information appearing at the top of topic. Two reasons:

  • New format can be detected by checking the first line, which is efficient. We don't want to take too much time detected that format is new format and that no conversion is required. (Note: might be worth bringing in a format version as part of meta data
  • Data can be easily appended to the topic
I must admit that neither are very compelling reasons.

-- JohnTalintyre - 13 Apr 2001

See also my "* [ PeterThoeny ]" bulleted notes above. Based on that I propose this format:

%META:TOPICINFO{version="$Revision: 1.28 $" date="$Date: 2001/04/26 21:11:24 $" author="$Author: JohnTalintyre $"}%
%META:TOPICMOVED{from="Codev.OldName" by="JohnTalintyre" date="976762680"}%
%META:TOPICPARENT{"NavigationByTopicContext"}%
%META:CATEGORYITEM%
%META:CATEGORYITEM{"TopicClassification" value="PublicFAQ"}%
%META:CATEGORYITEM{"OperatingSystem" value="OsWin"}%
%META:FILEATTACHMENT%
%META:FILEATTACHMENT{"Sample.txt" version="3" ... }%
%META:FILEATTACHMENT{"Smile.gif" version="1" ... }%
%META:END%

Comments:

  • Removed META:TOPIC{name="TestTopic1a" } because it is redundant (we know the topic name)
  • Renamed META:TOPIC to META:TOPICINFO
  • Put META:FILEATTACHMENT before all attachments, not after. Same for META:CATEGORYITEM
  • META:FILEATTACHMENT after META:CATEGORYITEM, for correct ordering and easy rendering

Placing meta-data at the beginning of the text makes sense. The function that returns the topic summary need to be made aware of that.

I like EdgarBrown's idea of handling meta-data at low level. Either the readTopic / saveTopic separate meta-data, or a layer on top of that. readTopic can also convert from the old file attachment and category table format, so that we have a clean code base.

We can't use a hash, because we have multiple keys with the same name, and because the order of keys is significant. readTopic could simply return ( $text, $metaData ), e.g. a flat meta-data text. Or it could return ( $text, @metaData ), e.g. a meta-data array. An array item contains simply a meta-data text line. Additional functions in TWiki::Store should offer add / remove / replace meta-data item functionality.

  • [ EdgarBrown ] Mmmm... I'm pretty sure you can have duplicated keys in a hash, but I just can't find it in my Perl books... ;^), however that is not the implementation I sugested, see below.
  • [ MartinCleaver ] If you´ve got two items with the same key, don´t you just hang an array off the hash?

-- PeterThoeny - 07 Apr 2001

I fixed the code example of my previous comment to actually generate the metadata hash of arrays that I suggested, you can find attached a simple example that reads the metadata in such format and just prints it out (funny that it's ignoring the empty META tags, I didn't intend it to do that).

If you run the example you'll see that it preserves order inside each metadata category, as it should. If the actual ordering of the individual categories is a problem (which I don't see why it would be so), then I can see your objection.

The reason that I prefer all the meta-data to be parsed up-front, is to avoid function calls or searches to extract the particular meta-data format. If this format ever changes, then there is a single place to deal with it. The reason that I prefer a hash is to ease the addition of meta-data in the future, the functions just need to be aware of the meta-data tags they are handling.

-- EdgarBrown - 13 Apr 2001

Some excellent discussion coming out of this. I think we're agreed that on reading meta-data should be seperated out from the rest of the topic and processed separately. Use of hash of similar structure for encoding all the data is still subject of debate, but we've still not said much about how the data will be processed, except the order that information is displayed in.

Some "* [ JohnTalintyre ]" bulleted notes above

-- JohnTalintyre - 14 Apr 2001

Here is a refined format:

%META:CATEGORYITEM%
%META:CATEGORYITEM{name="OperatingSystem" value="OsWin"}%
%META:CATEGORYITEM{name="TopicClassification" value="PublicFAQ"}%
%META:FILEATTACHMENT%
%META:FILEATTACHMENT{name="Sample.txt" version="3" ... }%
%META:FILEATTACHMENT{name="Smile.gif" version="1" ... }%
%META:TOPICINFO{version="6" date="976762663" author="PeterThoeny"}%
%META:TOPICMOVED{from="Codev.OldName" by="JohnTalintyre" date="976762680"}%
%META:TOPICPARENT{name="NavigationByTopicContext"}%
%META:_END%

We should design the meta-data handling for speed at view time. Also, handling should be simple. Therefore I propose above format with these changes:

  • Always name parameters in meta-data for easy of parsing and ease of handling. I.e. use %META:CATEGORYITEM{name="OperatingSystem" value="OsWin"}% instead of %META:CATEGORYITEM{"OperatingSystem" value="OsWin"}%
  • Don't use RCS revision format because it could have unwanted side effects, i.e. when you roll back a topic revision. Already now text in this topic changes each time you save the topic. At topic save time we know the new version number based on the current revision head number (+0 if repRev, else +1)
  • Above meta-data is stored to disk after sorting it in alphabetical order. That's the reason for the underscore in %META:_END%.
  • On topic read we simply split meta-data from the rest, no need to sort. How about KISS, e.g. meta-data as a simple array?
  • Once we have an OO TWiki::Store (after the forthcoming release), we can encapsulate the meta-data nicely. Until then we simply pass ( $text, @metaData ) to the scripts, and the scripts need to pass ( $text, @metaData ) back to TWiki::Store. (Instead of the array, we could pass a pointer to the array)
  • Manipulating the meta-data is reserved to functions in TWiki::Store: Add, remove, replace meta-data items; Get, set parameters of a meta-data item.
    • Those functions are very simple to implement, i.e. add item is: Add at end, then sort meta-data.
    • Speed is not relevant here, because it is not at view time.
  • We can create a new renderMetaData function in TWiki.pm that goes through the meta-data sequentially and renders visible data, ignoring the rest. E.g. first the category table, then the attachment table.

-- PeterThoeny - 15 Apr 2001

A hook system for meta-data rendering might be good. ie. the ability to register a function to render meta-data segments as needed.

-- NicholasLee - 16 Apr 2001

Should I ask, why do we still need the empty %META:CATEGORYITEM% tags?, it seems that with the sorting these should not be necesary.

On the ordering issue, rendering of items like the Category lists should probably not be dependent on the ordering of the meta-data, after all it should reflect the category definitions set forth by the user/administrator. I guess that you realized that when you proposed ASCII-betic sorting of the meta-data.

On another thing, maybe it's just me being heretic, but I don't see the need for an OO encapsulation of something that is just data. And yes, I do beleive that many times an Object Oriented structure is just overkill (and a performance drain), when a traditional procedural one would suffice, specially when the only sections of the code that would be affected are two procedures inside Twiki::Store.

-- EdgarBrown - 17 Apr 2001

I agree about not needing the empty tags, including the END one. An OO wrap for updating the data, would be nice, and speed would not be an issue for that, we can worry about this in the future.

I think the order of output should have nothing to do with the order of the meta tags.

I'm concerned that determing RCS number by just incrementing by one will fail in some circumstances, which could get very confusing. Initial a check using rcsinfo could be a good idea.

I think a hook for changing format is a logical extension, possibly another plugin function? But, I suggest we keep as just a core function for next release.

-- JohnTalintyre - 17 Apr 2001

I've now done a first cut.

  • Use array for meta data
  • Includes topicinfo data - currently still use $...$ RCS format
  • Includes support for ATTACHMENTS, upgrade meta on store and temporarily when reading

First cut code is attached, will make it to CVS real soon.

-- JohnTalintyre - 18 Apr 2001

The reason for the empty %META:CATEGORYITEM% is to specify the table header. And yes, we know the start and end of the category items, but including the header makes the rendering part easier and faster. John: I have not looked at your code, but go ahead and put it into Alpha. I'll work on the category table changes next week. (That's it for now. Currently travelling in Las Vegas, and without much time.)

-- PeterThoeny - 18 Apr 2001

First version now in CVS. Features:

  • Meta data stored at top of file in %META:xxx format
  • Meta data seperated from topic text when reading
  • Revision info added directly no $xxx$ format
  • Records topic moves (and offer to unmove)
  • Attachment in new format

Still lots to do, including finishing re-factoring (get rid of xxNew subs). Haven't put in TOPICPARENT yet, still in INFO and CATEGORY to be done by Peter after this week.

Format used is:

%META:CATEGORYITEM{name="OperatingSystem" value="OsWin"}%
%META:CATEGORYITEM{name="TopicClassification" value="PublicFAQ"}%
%META:FILEATTACHMENT{name="Sample.txt" version="3" ... }%
%META:FILEATTACHMENT{name="Smile.gif" version="1" ... }%
%META:TOPICINFO{version="6" date="976762663" author="PeterThoeny" version="1.0beta1"}%
%META:TOPICMOVED{from="Codev.OldName" to="CoDev.NewName" by="JohnTalintyre" date="976762680"}%
%META:TOPICPARENT{name="NavigationByTopicContext"}%

Notes:

  • I still think we should lose empty variables, Peter still keen to keep, final decision with Peter next week.
  • version in TOPICINFO is the version of the format being used - 1.0 will be first with META tags, might be useful when doing automatic version upgrades in the future.
  • TOPICMOVED only lives in the destination topic - I thought about putting a source topic as well, but that brought up a lot of issues. If topic not found, could do a search of META:TOPICMOVED tags, extra param to search?

The code is starting to look a lot cleaner, although I'm still unsure of effects on diff output, search etc.

-- JohnTalintyre - 19 Apr 2001

Just out of curiosity, is there any pressing reason for the date to be in system ticks???

-- EdgarBrown - 25 Apr 2001

Also, can I just check that the set ALLOWTOPICCHANGE and EDITORBOXWIDTH variables would also be captured in the meta data?

-- MartinCleaver - 26 Apr 2001

I can't see any particular reason for date to be in system clicks, this was Peter's original suggestion. However, it does make it easier to render in any format required.

I had wondered about the permissions variables. They would make sense as meta data, but we'd need a way of setting them and displaying them. At present they are as before. Any suggestions for a simple and clean mechanism.

-- JohnTalintyre - 26 Apr 2001

Time stamp in system ticks is more flexible and is known at topic save time. Storing the time stamp of topics for e-mail notification is also based on system ticks, that makes calculations like one hour inactivity easy.

META:TOPICINFO in above example has two versions="". Better to have one version="" that has the top revision and a new TWiki version meta-data:

%META:TWIKIVERSION{version="2.0"}%

More details in TWikiVersionNumberingScheme.

BTW, I escaped the RCS "$key: value$" in above examples because diff shows unnecessary changes since RCS updates those RCS keys on each topic save.

BTW, we hit the edit size limit on this topic for Netscape. Need to refactor.

-- PeterThoeny - 27 Apr 2001

I've done a few fixes over the last week. I hope to integrate into CVS on Monday.

-- JohnTalintyre - 27 Apr 2001

The differences display now always shows META:TOPICINFO and doesn't show other meta data well.

At present rcsdiff is called and its output parsed. I've done a crude adjustment to this output, but I think it would clearly be better to filter the input to the diff command.

My thought is this. Read in topic revisions, apply meta rendering, possibly with some difference to normal meta rendering. Feed output to Perl based differences code, then render.

Algorithm::Diff http://search.cpan.org/doc/NEDKONZ/Algorithm-Diff-1.10/lib/Algorithm/Diff.pm looks pretty good and has the advantage that we reduce an rcs dependency and a dependence on diff (which can cause problems on Windows). Adding extra CPAN dependencies is a shame, but this could this be put in the TWiki distribution itself (it's only one .pm file).

Thoughts?

-- JohnTalintyre - 09 May 2001

Mmmmm.... reinventing the wheel again.....

I don't like the idea of having yet another module, specially when system calls can do it nicely.

Why not just use: rcsdiff -I "%META:.*{.*}" ... ... ??? (rcs just uses diff, so the same options should work, it sure works with diff).

-- EdgarBrown - 09 May 2001

An interesting idea. A few comments:

  1. The -I option doesn't completely work as we don't want to ignore all the meta information (at least I don't think we want to ignore category and attachment information. If we just try and ignore TOPICINFO, this will only occur if the block identified by diff only contains TOPICINFO
  2. I don't think there's any reinventing of the wheel here - Diff.pm already exists.
  3. Given we want to have the option of using alternatively to RCS this helps by reducing the dependency on RCS

-- JohnTalintyre - 10 May 2001

I would still be concerned of performance issues, diff is a fairly optimized piece of code, running natively in the machine. The Perl counterpart needs to be re-compiled, and rely on continuous compilation of regexps to do its job (the execution time problems with CalendarPluginDev come to mind)

Besides any code that keeps version control has to have a diff facility built in, so if you are concerned that such generic facility would not support the -I switch or an equivalent, it might be better to remove metadata after retrieving the diffs anyway. We might actually want to render it differently, like attachment versions, or topic movements. So after the fact processing seems to be the way to go.

-- EdgarBrown - 10 May 2001

I have an existing TWiki setup using the 01 Dec 2000 release and have been experimenting with the latest alpha. The existing revision info of all the existing topics is unknown to the alpha code. If I edit the TWikiMetaData and set the version appropriately, the revisions are displayed. Is there going to be an upgrade script to handle this or will the topic format be backwards-compatible?

-- JohnAmidon - 01 Jun 2001

Sounds like you've found a bug. The alpha release is supposed to work with old an new, if there is no meta information on revisions, it should be taken from RCS in the usual way. After comments from Peter I decided to avoid the upgrade script route, this means the codebase is somewhat more complicated than required, but means that old files and old revisions should still work fine. When you do a topic save the format is upgraded. I check will this functionality again.

(note this was probably due to incorrect setting in TWiki.cfg of useRcsDir variable.

-- JohnTalintyre - 02 Jun 2001

See CategoriesAndParents

-- JohnTalintyre - 12 Jun 2001

Format definition now in MetaDataDefinition

-- JohnTalintyre - 21 Jun 2001

Basic support for meta data rendering now in CVS - see MetaDataRendering

-- JohnTalintyre - 26 Jun 2001

After the code got into a real mess when internally holding meta data in a list, I refactored to use a simple class Meta.pm that stores data as:

  • hash entry for each type of meta data
    • this is an array of entries (order maintained)
      • each item in array is a hash of key/value pairs.

Pretty comphrensive unitest exists in twiki/tools/test/TestMeta.pm

-- JohnTalintyre - 04 Jul 2001

Ok, after that change, does it still make sense to partition meta-data in two blocks, one before the topic text and one after the topic text??.

I understand that this was for rendering purposes, but since all the meta-data is in a separate structure such partition doesn't seem to make sense anymore.

-- EdgarBrown - 19 Jul 2001

The separation to start and end of topic was done when the Meta class was introduced and the idea was that diff would then look at bit more sensible. Personally I'd rather see all meta data at the top of the file with guarantee that it's all grouped together (more efficient for searches and gathering of meta data). But, allowing meta data anywhere is possibly more robust. It would also be good to see a Perl based diff so that data can be filtered before the diff.

-- JohnTalintyre - 20 Jul 2001

My desire was to be able to render some meta-data before the text and some after. As long as I have that flexibility, I don't care where the data is stored. (Just confirming what Edgar and John say above, from a user POV.)

I suspect the other gentleman who made the request had the same desire. (I thought I had made a comment on this page -- maybe it was on some other page -- doesn't matter.)

-- RandyKramer - 20 Jul 2001

Topic attachments
I Attachment History Action Size Date Who Comment
Compressed Zip archivezip Twiki.zip   manage 46.8 K 2001-04-18 - 17:19 JohnTalintyre First cut meta implementation
Perl source code filepm getMeta.pm   manage 1.0 K 2001-04-14 - 22:19 EdgarBrown Example of metadata extraction in a hash of arrays
Texttxt metadata.txt   manage 0.9 K 2001-04-14 - 23:01 EdgarBrown Metadata example in the proposed format
Edit | Attach | Watch | Print version | History: r49 < r48 < r47 < r46 < r45 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r49 - 2005-02-15 - SamHasler
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.