Tags:
archive_me1Add my vote for this tag create new tag
, view all tags
CategoryStale
AttachmentsUnderRevisionControl "historic" information put here, so old topic can be used for current issues.

-- JohnTalintyre - 15 Mar 2001

Can attachments be put under revision control? Please?

We are using a corporate supported system called "Livelink Intranet" that provides features similar to TWiki with the added advantage of higher complexity, being extremely slow, and hard to search (it may cost more, but it's harder to use :-).

While this commercial system has a couple of features (e.g. discussions and to-do lists) that can be emulated using TWiki in a different fashion, the big feature TWiki is missing is the ability to checkin and checkout attachment files with all the expected locking mechanisms implied by checkin and checkout. We use the commercial system to handle revision control of (mostly) Word and Visio documents, which are more difficult to handle within the CVS paradigm because it is extremely hard to merge a binary file during checkin.

The FileAttachment section at the bottom of TWiki pages needs to change the update action to a checkin and add a checkout action. The graphic (or graphics) at the far left of the file line should indicate that the file is checked out. We also need access to older versions of the file, similar to the bar at the bottom of all TWiki pages.

The commercial system has the added feature of being able to convert many document type from their native formats to HTML on the fly, but most people do not use this because the conversion is pretty poor; e.g. text in Word docs are converted, but figures are tossed. I wouldn't consider file conversion a requirement, but it might be nice to be able to send correct mime information so that an appropriate filter could be triggered on the web browser.

-- JohnAltstadt - 07 Apr 2000


I see the needs for exclusive locking of FileAttachments in certain environments. It could be a new feature in TWiki, however it should be optional, e.g. a new flag in wikicfg.pm would tell if to use the current non locking mechanism, or a new locking one. Regardless of the flag, a revision history of attachments to access older revisions would be a nice new feature.

See AutoConvertAttachmentsToHtml regarding converting documents from their native formats to HTML on the fly.

-- PeterThoeny - 09 Apr 2000

We're keen to have attachments under revision control. To that end I've start working on re-factory twikistore.pm to allow this. So far:

  • Written units tests for twikistore.pm
  • Tested basic ability to update RCS files for attachments (including binary)

To do:

  • New templates and attach.pl code - I'm thinking along the lines of clicking "action..." rather than update, this will list all version of attachment and allow update
  • Exclusive locked. I want to offer this as a site preference, if this on then allow user to exclusive lock if they want to
  • Wondering about using text comment in RCS to store comment on attachment, so that list of revision can include this
  • Wondering whether to use labels so users can have their own version numbers
  • Rename also good to have, parts of RenameWeb appropriate, but data and pub files it would be methods in twikistore.
  • Keen to have delete function for attachments, probably based on TrashCanWeb. twikistore functions would essentially be the same as for rename.

I notice that a couple of open source indexing/search engines have converters to text for Word and pdf files. Whilst not offering html, it does at least offer the ability to search the whole site including attachments. If HTML required for Word, I reckon it's hard to beat using Excel via COM to do this, although a pain in a purely unix setup.

-- JohnTalintyre - 30 Jan 2001

Some sort of simple file manipulation scheme would work easily. Otherwise rewritting the storage mechanism is required, as basically the you need TWiki to be able to deal with pub/ in the same manner it deals with data/.

With the simple mechanism TWiki would storage attactments with their version number encoded in their file name.

ie. picture.jpg-1.0 and picture.jpg-1.1

You then provide indication of the latest version via a sym link. ie picture.jg -> picture.jpg-1.1

Locking could be done with a .lock file. Note this require minimum changes in the wikistore.pm, view and attach scripts.

A more complex scheme with TWiki explicit file type knowledge requires a lot of work.

-- NicholasLee - 30 Jan 2001

Iwould like to ask the symlinks not be used... thats right, i work in a winNt shop and versioning of attached word files is something that we would like. Admittedly we would be tempted to jsut through these files into VSS for versioning, but our stupid FilesSystem can't do links.

-- SvenDowideit - 30 Jan 2001

I already have a first cut working with wikistore and RCS - whilst changes are not minor, they're also not that complex.

To me it would seem very odd to using a version control system for pages, but a different mechanism for attachments. There is the clear downside that in the present model the disc space usage goes up a lot (file and RCS file, so double usage for 1 version) - this could be removed, but I don't think is worth it.

-- JohnTalintyre - 31 Jan 2001

John, thank you for working on this and offering it back to the TWiki community. Please note that we just did a first step in modularizing TWiki; wikistore.pm was just part of the wiki.pm package. Now it is it's own TWiki::Store module that has all file I/O stuff (more in HowShouldTWikiBeModularized). See also some previous ideas at DeleteAttachment.

You are right, all file manipulation actions should be done in TWiki::Store. E.g. attach file, get previous file version, get diff (??? of ASCII files only?), lock, unlock, delete and rename file. Backend part of rename and delete should be designed together with topic rename and delete.

We need to design the UI carefully. It should be easy and intuitive. KISS is the way to go. I would not go for text comments nor labels in RCS, because people can place a comment in the topic, and the topic text is under revision control.

A note on the attachment table:

  • The file name is a link to uploaded file in the twiki/pub/Web/Topic directory. Once versionned, it should show the file version that was active at the time of a previous version you are looking at.
  • View is executing the viewfile script that basically delivers the same file as the file name link. Was intended to be used for viewing AutoConvertAttachmentsToHtml files.
  • Update shows the upload screen. As John points out it could be renamed and used for actions on the file, like view previous versions, diffs, rename, delete, lock/unlock/break lock.
  • Size, Date, Who and Comment should show the file attributes of the version you are loking at (e.g. when looking not at the latest topic revision)
  • Who is the person who uploaded the file. In case locked, probably should show the person who locked the file, together with a "locked" indication.

Question how to handle version numbering:

  1. Keep them in synch with the topic version number, that means uploading a file to topic 1.6 would create a 1.7 topic and an attachment with version 1.7. It also means that uploading the same file within one hour will not generate a new version. Depending on how you look at it, this is a feature or a bug wink The advantage is that it would be transparent to the user; i.e. viewing a previous 1.4 topic version would show the file versions that have been active at that time.
  2. Or, keep version numbering of attachments and topics separate. I.e. attach a new file to a 1.4 topic makes attachment 1.1, at a later time when you attach the same file to the same topic with version 1.7, it will get 1.2.

1. is the KISS way I would vote for. Opinions?

I would not worry to store the file twice, once in pub and then as a RCS repository.

Another question is how to distinguish between binary and text files. RCS needs to know that. Should there be a preferences variable with text file extensions (or suffix rules)?

-- PeterThoeny - 18 Feb 2001

Move to all "file" access being in Twiki::Store is great. I attach my work in progress of add revision control to the old wikistore.pm.

On the KISS principle I reckon each attachment having its own versions is a good idea. I might be missing something, but I can't see the benefit of synching to the topic version, except to show the correct version of attachment when viewing a old topic. If this is the case then would just using dates be easier? It's not something that would be used often, so why not do on view rather than update. Alternately, perhaps just change the category table to include the version.

At present I'm assuming all attachments are (possibly) binary, because I couldn't think of a simple alternative.

When listing all the versions of an attachment it's good to show the comments. This is not so easy to get at if the are held in the RCS history of the topic. Whilst it would be duplication of data, I would think it a good idea to store in attachment RCS history as well as in topic. Also handy it attachment is moved to another topic.

I also looked at RenameTopic as part of this. Ideas on that page.

-- JohnTalintyre - 19 Feb 2001

I've now moved code to Store.pm. Next steps:

  • rename attachment - need to re-write attachments table at backend. Front end currently has page for each attachment, so probably best to have rename option on the page. To start with leave user to resolve any references.
  • Delete attachment can be initated from the same page, but some issues.
    • Delete by removing from attachment table. Would it stay in pub directory or move to trashpub? Then how could undelete be done from an interface perspective? We could have the concept of "hidden" files for a topic. This could also be useful for keeping list of attachments down e.g. when there a lots of images. Hidden files could have the . prefix as usual, so if I delete xyz.txt it would be removed from attachment table and name changed to .xyz.txt. When viewing there would be a "ShowHidden" link (ls pub/web/topic/.*). Change in name would mean existing links wouldn't work - a desired effect I think.

Thoughts?

-- JohnTalintyre - 26 Feb 2001

Initially we should work on revision control of file attachments because it adds most value to TWiki. Other actions like rename, move or delete attachment, or diff, undo attachment versions can be done later.

Attachment link: We simply need to show the correct version of an attachment when looking at an older topic revision. You are right, the attachment version number is not relevant. Using date for attachments can be tricky because a topic revision does not change when you update it within one hour.

Idea: We could store that attachment table in the topic not as an HTML table (current implementation) but as new %FILEATTACHMENT{..}% variables that get expanded at view time. Example:

%FILEATTACHMENT%
%FILEATTACHMENT{"Sample.txt" version="1.3" path="C:\DATA\Sample.txt" size="123" date="976762663" user="Main.PeterThoeny" comment="sample text file" }%
%FILEATTACHMENT{"Image.gif" ... }

The first line gets expanded to the table header, e.g.

FileAttachment: Action: Size: Date: Who: Comment:

Subsequent file attachment lines get expanded, i.e.

Sample.txt action 123 18 Jan 2001 PeterThoeny sample text file

This is fast for the top revision because no RCS lookup is necessary, simply use "%ATTACHURLPATH%/Sample.txt" for the file link (assuming that we keep the redundant file in the pub directory). How to present the file link when looking at a previous topic revision? We can store the attachment version number in the %FILEATTACHMENT{...}%, that way we know the active version at that time. The modified viewfile script can deliver a previous attachment version, i.e. viewfile/Web/TopicName/Sample.txt?ver=1.1. It is important to append the filename to the extended path so that the browser can pick up the filename correctly.

What about current file attachments? It should be backward compatible. The attach and upload scripts should be able to read both formats. The upload script stores the table only in the new format.

Binary vs. text: How about a new variable like

  • Set TEXTFILESUFFIX = .txt, .html, .htm, .h, .c, .cpp, .pm, .pl

Text files would be stored in text format, other files in binary. Question is how to handle exceptons, i.e. a file that has been checked in as binary, then it's file extension has been added to %TEXTFILESUFFIX%, and the user finally tries to make a diff. Or the other way around.

Comments: No problem if we store it twice (in topic and as RCS comment)

-- PeterThoeny - 26 Feb 2001

Peter, all excellent thoughts. A minor consideration. Could we add a hidden and possibly deleted attribute to %FILEATTACHMENT%? I thought it was worth toying with rename as it might give some extra ideas. For instance, people have asked for the option of attachments under revision control. Given this is probably best to continue to support pub access and deletes can't simply be removal of file, if recovery is to be possibly.

I can't remember if I've raised this issue. But, does it matter that "*,v" files are readable by Web clients?

-- JohnTalintyre - 26 Feb 2001

Sounds good, we can add a attr="h" to %FILEATTACHMENT%. The value "h" means hidden, could also be "d" for deleted or a combination like "hd".

I don't think that it is a problem that "*,v" files are readable by Web clients. But wait, it can be an issue in case we offer read restriction based on topic level (vs. web level). In that case we need to place read restrictions on attachments of selcted topics (including "*,v" files).

-- PeterThoeny - 26 Feb 2001

Changes compatible with new structure in attached zip. I would appreciate it if someone merged into CVS or gave me access to do it. Code isn't finished yet, but is I think well worth putting into CVS at this time.

-- JohnTalintyre - 27 Feb 2001

I'm still looking for best way foward on access to RCS histories.

Idea: Comment:
Always read access unless blocked because topic read blocked Messy - given that there's no reason for any history to be available and because protection would have to be in Apache rather than TWiki
Always use viewfile to get at page - RCS blocked as for .txt with view script Performance will be impacted (does this matter?) %ATTACHURL modifcation should deal with internal access. Access from outside TWiki could be done be "re-directs".
RCS directory for each pub topic directory Can Apache easily be configued to protect? What about other Web servers? I think the rcs files could be in a totally seperate location, have to check, but likely affect is change to many rcs calls

Still tidying code (meet style in ReadmeFirst included). Also, about to do %FILEATTACHMENT% change.

-- JohnTalintyre - 06 Mar 2001

Instead of trying to deal with the %FILEATTACHMENT% format and the old format at the same time, I'd suggest writing a perl script that updates the old format to the new as part of the Twiki update process.

-- AlainPenders - 07 Mar 2001

Alain. A good point. It would certainly mean we had cleaner code in the distribution. Interesting to see what others think.

Suggested way forward on location of history files - put off data directory e.g. Web = aweb, Topic = ATopic, Attachment = aAttachment

data/aweb/ATopic.txt
data/aweb/ATopic/aAttachment.txt,v

Note - didn't take long to modify Store.pm to do this.

-- JohnTalintyre - 07 Mar 2001

Accessing file attachment repository files: How about making it as simple as possible? Place the RCS files in the same pub directory where the attachment is. This is simplifying the code, and the user can access the attachment as before (there is lots of text like %ATTACHURL%/image.gif). The downside of this is that indivitual topics can't be easily view protected (unless we generate a .htaccess file for each directory). So this would be more a "hide by obscurity" design for attachments. File location using your example:

data/aweb/ATopic.txt topic text
data/aweb/ATopic.txt,v topic repository
pub/aweb/ATopic/aAttachment.txt file attachment
pub/aweb/ATopic/aAttachment.txt,v file attachment repository

Updating old file attachment format: I strongly suggest to keep the code compatible with older formats. What if the user restores older webs from a backup? I don't want to be in a siuation where I can't read an older MS Word document using MS Word (Micro$oft bashing can be fun sometimes wink . As John said, the code will not be as clean, but TWiki should always able to read an older format. The code can be made readable by splitting up fila attachment table read into two functions, the old one and the new one.

-- PeterThoeny - 07 Mar 2001

Can attachments be put under access (read) control?

-- SteveRoe - 09 Mar 2001

Read access control of attachments? Good question. May be we should design for that in the first place. smile

If we do that we need to move the file attachments out of twiki/pub to a directory that is not under the htdocs directory tree. Attachments would need to be dynamically served by the viewfile script. As John mentioned, this raises a performance question. I think perforance is not a big issue for accessing regular attachments like Word files, but can be an issue for inline images (<img src="..."> tag). If we put the attachments out of the htdocs tree I suggest this:

data/aweb/ATopic.txt topic text
data/aweb/ATopic.txt,v topic repository
data/aweb/ATopic/aAttachment.txt file attachment
data/aweb/ATopic/aAttachment.txt,v file attachment repository

In this case we need to find a way to serve existing links like %ATTACHURL%/image.gif or %PUBURL%/%WEB%/OtherTopic/image.gif.

John, what do you think?

-- PeterThoeny - 07 Mar 2001

How about letting the user who attaches a file decide whether or not it is readable to all? Only files that aren't readable to all need to be accessed through the viewfile script. Then the attach form should be setup so that inline images are always readable to all... which makes sense as the person viewing it has already passed the read access rights of the page that contains the image, which should be sufficient.

-- AlainPenders - 10 Mar 2001

(posted Alains's e-mail sent to TWiki-Dev mailing list)

To me it makes sense to offer only one type of access control, the one we have per web (enabled with "Set ALLOWWEBVIEW" as descibed in TWikiAccessControl); and later per topic. Allowing an access control setting per attached file seems to raise the UI complexity too much. After all, if you need to have different acess control for attachments, use different webs or different topics.

The question is how strong we should design access control for attachments. Is hiding by obscurity enough? In case yes, we could simply add a generic index.html in each pub web that hides the attached files. So an unauthorized user would need to know the topic name and the name of an attachment to access the file via URL. The current view access control is a "good enough security for many but not all cases" anyway, so we might just go for that. Take three of proposed directory structure is now:

data/aweb/ATopic.txt topic text
data/aweb/ATopic.txt,v topic repository
pub/aweb/index.html dummy index file that hides content
pub/aweb/ATopic/index.html dummy index file that hides content
pub/aweb/ATopic/aAttachment.txt file attachment
pub/aweb/ATopic/aAttachment.txt,v file attachment repository

-- PeterThoeny - 10 Mar 2001

I agree with Peter that one form of access control is preferable. For now the software can easily switch between attachment rcs files being in either data or pub. But, moving the attachments themselves will be a bit more difficult because of existing installations. I don't understand the point of the dummy index files as content isn't listed at present anyway.

Although messy we could move attacments under read control to the data area, such that they were only available using viewfile. Having said this, this just make me feel more strongly that it's worth trying to do all access via viewfile and hence allow all the attachments to be in the data directories.

On making %ATTACHURL%/image.gif work, can't we simply define %ATTACHURL%/ to be "%SCRIPTURLPATH%/viewfile?filename="? We'd have to ignore any leading "/s". Or make viewfile parse URL. %PUBURL%/%WEB%/OtherTopic/image.gif is a bit more difficult. If we're avoid re-writting the data then I guess PUBURL could map to viewfile, which would detect that filename param was missing and instead parse URL as per other scripts.

-- JohnTalintyre - 12 Mar 2001

The idea of placing a dummy index.html file into the directories is to hide the files in the directory. Most web servers display the file content of a directory if the index.html file is missing. You are correct, the attachment directories are not directly linked, however any user could type in the URL and see the file list.

%PUBURL% and %PUBURLPATH% are used elsewhere, i.e. for the TWiki logos, so we can't easily redefine them.

To simplify matters I'd suggest to keep the attachment files at the current pub location, place the RCS files there too, and create a dummy index.html. We could even generate a .htaccess file to disallow access of *,v files. (My 0.02c. But it's up to you to decide John, since you work on the implementation.)

-- PeterThoeny - 12 Mar 2001

I've been thinking about WhatDifferenceBetweenATopicAndAWeb? You might want to ponder in case it affects your proposed structure.

-- MartinCleaver - 12 Mar 2001

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r2 - 2006-03-09 - CrawfordCurrie
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.