This topic grew out of reading about
TrashCanWeb and
RenameTopic, which inspired me with the idea of topics as objects.
I propose the following:
If you take the premise that an object's topic/page name is just another of it's attributes, then you can keep a history of a page's place in it's web. I would implement this as follows:
- A new topic is given a unique ID (filename) such as a textual representation of it's webname and datetimestamp: "Codev20010330150843003.txt". This could be thought of as it's Object ID (OID). (Please! not Unix ticks! The platform independent method should be a GMT/UTC string of the form: %WEBNAME%yyyymmddhhmmssttt.txt (web name, years, months, days, hours, minutes, seconds, milliseconds).)
- Within the file have a (maybe hidden in a comment) name field. This could also be just one of a collection of per-file (per-object) attributes. Could also be TWiki macros.
- When a user renames or moves a page, update the name field within the file and then check it in to RCS.
- The Page Name Service (PNS) is likewise updated and revisioned.
- The PNS is used during normal page edit/update processing to turn wiki words into the corresponding datetimestamp.txt filename.
- The PNS is also used to convert datetimestamp.txt names into human-readable topic links (wiki words and bracketed topic names) when the page is rendered for viewing.
Of course, the downside of this is that you are no longer able to read a page's name in a directory, but then one might argue that you shouldn't be doing direct editing of wiki pages anyway. It might also be somewhat of a security feature to obscure a page's name.
What does using this name dereferencing buy?
- Makes renaming trivial:
- Change the text name (as above).
- Update the PNS.
- Topic references are resolved at "view rendering" time.
- Unnaming is done by revising to use the most recent previous name change (idea of this needs expanding).
- Provides a modest amount of privacy within the TWiki directory structure.
- Makes deleting easy:
- Remove the name/stamp association from the PNS.
- Or mark it deleted and have a "view deleted" capability?
- Preserves continuity of revisions.
- The attribute scheme is open ended for adding other features with minimal impact.
- "Author" records the creator of the page.
- "Last changed by" independent of RCS
- Access Handler Access Permissions ?!?
- Template !!
- Edit Handler
- View Handler
- Oiks! this is getting to be a lot like a complete restructuring - powerful too!
- Slices, dices, chops, spits into the wind, stands on Superman's cape, messes around with the junkyard dog and Jim.
--
DavidLeBlanc - 30 Mar 2001
See also
PackageTWikiTopic. Although there is a bit of thought recently that I haven't put up, mainly from the work I've been doing in the expertimental branch.
--
NicholasLee - 30 Mar 2001
Some comments:
I like the idea of simplifying the moving and renaming of pages, and it sounds like the proposal does that.
I also like the idea of having a unique identifier for each page -- without having administered a TWiki yet, I can imagine becoming confused when pages start to move (change names).
However I have the following comments. I hope the proposal can be revised to consider the following (if it does not already):
Info: The swiki software used on the swiki farm (www.swiki.net) numbers each page in the swiki -- the numbers start at 1 and increment for each new page. In addition, a page can be called up by typing its name (or its number) in the browser address bar.
Comment: I like two things about the swiki implementation:
- Because page names are usually easier to remember (more mnemonic) than page numbers, I like to call up pages by typing the name directly into (or editing) the address bar. (I could not tell for sure from the proposal whether that would still be possible.)
- I'm starting to remember some of the numbers on swiki, so I sometimes type those in. I can remember some frequently used numbers (like 1, 14, 132), but would have a hard time remembering (or even typing, if I had it written down) a number like "Codev20010330150843003.txt"
Besides, I'm not sure having the time stamp within the file name is any more useful than a serially assigned number. If you are interested in which page came first, the serially assigned number tells you that, if you want the time of creation, that's available in
RCS, isn't it? (Or the original file creation date, at least in Linux -- I'm a Linux newbie, so I'm not sure whether the file creation date is preserved when a file is moved between directories. If it is preserved on Linux, I am fairly sure it is not preserved in Windows.)
[It is preserved on Windows if you use NTFS. JoachimDurchholz - 12 Sep 2001]
Other Comments:
- It sounds like this helps when renaming within a web, but leaving the web name doesn't help when moving pages between webs? (I agree that moving pages between webs is probably far less common than renaming within a web, but it does occur.)
- When I initially read the proposal, I was afraid that it would cause difficulties for my intent to use an external search engine to index and search the "raw" .txt files. I guess those difficulties are not insurmountable, but I would probably require some programming to at least display a user friendly page name for each hit rather than, for example, "Codev20010330150843003.txt".
--
RandyKramer - 31 Mar 2001
Good point about including the webname in the file/object name: there's no real need for that. Nor is it necessary to use a datetime stamp as the file name, it was what came to mind at the time as a uniqifier. I wonder how it would be possible to create a persistant counter for naming files though... particularly if it's possible in some scenario that there are two handler threads processing a TWiki request at the same time (as seems entirely possible in Apache for instance). The problem exists to some extent with using datetime stamps, which is why I specified it down to the millisecond under the assumption that it'd be darned hard for there to be a collision (check for this and add a uniqifier at the end?).
As for typing a "friendly" name into the "go" box - there's no reason why the logic behind the "go" box couldn't look up the name in the PNS and convert it into the numeric filename. Likewise, it could see that it was a number and go directly to the requested file.
Likewise, a search engine could use the PNS to convert to a friendly name for display. It's also worth noting that having a fixed size for file names could be a bonus for a search database.
Thanks for the comments
--
DavidLeBlanc - 31 Mar 2001
You're welcome!
I really meant that I like typing a page name into the browser address bar -- being able to type it in the "go" box is a help, but sometimes I'm somewhere else entirely and I just want to type a page name into the browser address bar. I can use bookmarks, but my bookmark list keeps expanding, making it harder to find the right one. (One option I sometimes use is typing the beginning of a TWiki address, then letting the browser history / auto-completion feature display the name of the most recent TWiki page I've used, select that, and then edit the page name to the page I want now.)
--
RandyKramer - 07 Apr 2001
This system is going to mandate the requirement for a web/topic index (your PNS I guess), since the web/topic key no longer becomes the unique method to find the content.
I think that this overly complicates things, but it should be possible to create a system like this once
PackageTWikiStore is complete. In fact a TWiki using a DBI backend would probably use some sort of OID concept.
--
NicholasLee - 31 Mar 2001
The Web/Topic key still is the unique identifier. It indexes into the PNS (think DNS->IP Address equivelent) to get the OID disk name. It's a bit slower (memory cache feasible?), but it does do a lot towards revision continuity which no other renaming scheme that I know of does.
--
DavidLeBlanc - 31 Mar 2001
I guess "revision continuity" is important if you think that that web/topic key is part of the content, rather than a path too the content.
A memory cache is feasible only in a mod_perl type enviroment, although you could probably do something with shared memory and/or an external daemon.
--
NicholasLee - 31 Mar 2001
There are some interesting ideas above, but I'm unsure how the unique would work with the current dynamic linking of
WikiWords and topics. At present, if TopicA reference non-existant TopicB, then when B is created, someone viewing A will get link. How does this work if A needs to contain a unique id to B? It seems to me that on creating or renaming topics one always has to check existing topics, in all Webs. Similar to requirements if using caches.
--
JohnTalintyre - 01 Apr 2001
No reason the referring page can't have the text of the page's name and do a look up upon rendering to encode it as an internal double-bracket type link - or upon view to convert from wiki name to OID (again, as a double bracket link: [ [wiki word] [file://20010401001.txt] ]) - the argument passed up to the view function wouldn't change, but the viewer would do the name->OID conversion on the fly (this would essentially make this transparant to the user).
Nicholas: Re: "web/topic key is part of the content" - I don't think a thing's name is the thing, so I guess it's part of the content (Are you "Nicholas" or "you") :-).
--
DavidLeBlanc - 01 Apr 2001
Nouns are things. Regardless if TWiki::Store becomes generic enough shouldn't matter if you use the ideas above or not. However stage one of the work going forward is simply to provide the current mechanism.
--
NicholasLee - 02 Apr 2001
If name->OID happen on the fly when viewing, then I don't see how this making renaming easier. The main difficult with renaming is changing references. When could allow oldName->OID, but this will break if oldName is re-used.
--
JohnTalintyre - 02 Apr 2001
I am a bit afraid that this will prevent using wiki as the source of data for scripts
and to generate pages from scripts. One very nice feature of the TWiki design is its straightforward design that can be grasped by a human being. Going in the proposed
direction seems to complexify things with no real gain.
--
ColasNahaboo - 02 Apr 2001
The unique topic ID is a powerful way to keep track of object changes.
ClearCase for example uses that to store elements (files, directories, ...), so
ClearCase always knows what happended, even after a rename or a delete. You can for example go back in time and look at how a set of elements looked in the past.
To me it is questionable if we should go in this direction. It raises the complexity of TWiki and probably slows down TWiki because of the additional mapping. We should keep KISS in mind, as stated in the mission statement at
ReadmeFirst.
However, this topic is a
FeatureBrainstorming topic, so anything goes! It is better to look at 10 ways of doing things and pick the best solution, then looking only at two alternatives.
--
PeterThoeny - 03 Apr 2001
I have to agree that this just adds to the complexity, I have been looking into this wrt the
SearchEngineVsGrepSearch and the required storage for the indexes, but the only advantage I see in this proposal is keeping the size of the file names fixed, but by just setting a limit to the topic name size, a similar advantage could be achieved.
To address some points in the discussion: The revision continuity can be mantained as is,
RCS does not store file names in its files, so by just renaming the files we can keep previous revisions during a rename.
The same can be said about CVS (even though it's probably not a good idea to go playing in the repository).
[This just means that RCS will not undo a rename when going back to an old revision.
IOW if you rely on RCS and nothing else,
all links to a renamed topic will break
if you roll it back beyond a rename.
The problem is that a rename is a global operation,
affecting all files that have a link to the topic.
(To put things into a different perspective,
which may give useful insights:
Essentially this means the rename is potentially affecting
every document in the WWW.)
Rolling back a rename
would mean going through all topics
and switching topic names back.
Besides, the code that does the rollback
would have to go through the restored topic contents
and change all references.
This gets really ugly if topic Foo was renamed to Bar,
and then a new topic named Foo was created.
I think that's a lot more complexity
than adding an indirection layer between topic name and file name.
JoachimDurchholz - 12 Sep 2001]
The only addition I would consider in this respect is to store the topic name inside a meta tag in the file, not unlike the current categories (as a history of names), so that when somebody does a name/web change, the previous name could still be linked to the file (via a search), and the previous names could be shown as part of the file diff history. Or even as part of the view template, as "
This topic was previously known as:.... "
As for administration, a simple perl utility could list all the name/web changes that have happened in a directory, by parsing this meta tags.
--
EdgarBrown - 03 Apr 2001
There are in fact some additions I'm considering to the data file that involve adding information to some of the rcsfile(5) fields. These should follow the KISS principle and not change the current behaviour, just adding additional information. Particular
desc and
log:
desc ::= desc string
deltatext ::= num
log string
{ newphrase }*
text string
are ripe for use and can even be parsed by shell tools from the rcs utilities output.
--
NicholasLee - 03 Apr 2001
I think I prefer the idea of storing it in the actual text, that way it's efficiently accesible through perl (without requiring
RCS system calls, or complicating the requirements of
PackageTWikiStore). The information will just end up in the same files anyway, it's just a difference of in what format it is being stored.
Besides, in terms of storage space (or of search time, depending on the implementation), it will be more efficient to store it in the actual text, as the body is diff'ed for each new revision, but the logs and history are just incremental.
--
EdgarBrown - 05 Apr 2001
It's good to see the discussion above about storing information when moving/renaming topics. A present
MoveTopic moves all the files (including
RCS) and stores the move information in the log. I'm tempted to put in the optional function of having an file for each topic that stores where it came from and/or where it went to. This could be used for a variety of purposes:
- Viewing topic in TrashCanWeb - would tell you where topic came from and offer to put it back.
- Trying to view a topic that had moved - tell you it has moved and offer re-direct
- Fixing mistaken move
- Stats on moves
The file format would allow for more than one change e.g. A->B C->A.
If a file is put at source and destination then information is at both, but I think this better fits in with the way TWiki works.
--
JohnTalintyre - 06 Apr 2001
Mmmm.... If you are thinking of using a separate file
for each topic, I don't really like the idea as it just means that a ton of files would be needed to be kept in sync, just increasing the overall complexity.
I still prefer to store the info inside the text itself (easier to keep in sync if there is only one file to worry about), granted, it is harder to trace the history of moved (and I hadn't even thought of deleted files), as a search has to be performed to find them.
However, the same way that was proposed in
SearchEngineVsGrepSearch could apply here as well. Besides storing the info in the text (to make life easier for the view script), a separate DB_File could be kept with the move/rename/delete information--which could still be reconstructed, at least partially if something is permanently deleted--from the existing files.
Advantages: Just one more file to worry about (and that's total, not per topic) and it's not much of a worry, fast access to history information, faster execution for the view script in case that aka/former-name info is displayed.
--
EdgarBrown - 06 Apr 2001
I added some comments above, after my previous comments and
DavidLeBlanc's followup.
--
RandyKramer - 07 Apr 2001
Mmmm.... There is an interesting discussion on implementation of meta tags in
GenericMetaDataStoreForTopics, that surely relates to this one.
With that format the history info we are talking about could be something like
%HISTORY{"Old_name","Older_name","Oldest_name"}%
--
EdgarBrown - 07 Apr 2001
Just in case this hasn't been covered. Content management systems typically have the content (as well as the meta data) compartmentalised. So, consider that what could go in the topic is <TOPICBODY> the current text of the entire topic </TOPICBODY>. This then creates space for as many fields as required for the content as well as for the meta data. (For example, <HEADLINE/>, <LEADING_PARAGRAPH>
--
MartinCleaver - 08 Apr 2001