Tags:
create new tag
view all tags

Feature Proposal: WEBLIST canmoveto and cancopyto

Motivation

On a TWiki site having thousands of webs, %WEBLIST{...}% may take too long. On the rename/move page, %WEBLIST{...}% needs to be used to show the list of destination web options. CopyScript is going to introduce a similar page which also has %WEBLIST{...}% for the similar purpose.

In both cases %WEBLIST{...}% needs to return within several seconds at most even if the site has thousands of top level web and subwebs.

In addition to a large number of webs, ReadOnlyAndMirrorWebs, UsingMultipleDisks, CopyScript are concerned.

  • You cannot move or copy a topic to a slave or read-only web (ReadOnlyAndMirrorWebs)
  • You cannot move a topic to a web residing on a different disk. Copying a topic to a web residing on a different disk is fine (UsingMultipleDisks)

Description and Documentation

What's implemented already

This is already partially incorporated into the core in the trunk -- With the rename page, %WEBLIST{webs="...,canmoveto" ...}% is used in place of %WEBLIST{webs="...,public" ...}% on templates/*.tmpl. And TWiki::WEBLIST() and TWiki::Store::getListOfWebs() are enhanced accordingly. As such, this proposal is raised for clarity and transparency in this regard.

To reduce the time to take, canmoveto list is made in the following steps if RepositoryForSiteAndWebMetadata is in action and all webs are required to be registered, which should be the case if you have thousands of webs.

  1. Obtain the list of top level webs from the metadata repository instead of traversing the {DataDir}
  2. With the current web, subwebs of the top level webs of the current web are added to the web list
  3. If the user has their own personal subweb, that's added to the web list
  4. Webs not writable and residing on a different disk from the current web are eliminated

For CopyScript

For CopyScript, a further enhancement of the similar nature is needed. As mentioned at Motivation, the copy page's destination web options are different from the rename page's. A copy destination needs to be writable (not read-only or slave) but can be on a different disk from the source.

As such, the page template of the CopyScript, namely copy.tmpl and copy.SKIN.tmpl, shall have %WEBLIST{webs="...,cancopyto" ...}% for destination web options. This is achieved by enhancing TWiki::WEBLIST() and TWiki::Store::getListOfWebs().

Excluding webs from WEBLIST result

You may want to exclude some webs in the canmoveto and cancopyto web lists.

You may rotate Trash web periodically -- creating a new Trash after renaming Trash to Trash1 after renaming Trash1 to Trash2 ... after renaming Trash9 to Trash10 after deleting Trash10. In that case, it doesn't make sense to move a topic to Trash1, Trash2, ... while moving to Trash is fine.

With copying, there is no value in copying to Trash or rotated Trash.

If you are UsingMultipleDisks, each disk has its trash such as Trashx1x and Trashx2x. And those trash webs may be rotated too.

Those shall be achieve by the following configuration parameters.

$TWiki::cfg{WEBLIST}{canmovetoExclude} = qr/^$TWiki::cfg{TrashWebName}(x\d+x)?\d+$/;
$TWiki::cfg{WEBLIST}{cancopytoExclude} = qr/^$TWiki::cfg{TrashWebName}(x\d+x)?\d*$/;

Examples

Impact

Implementation

-- Contributors: HideyoImazu - 2012-12-21

Discussion

Possibly better to not make a distinction between copy and move? This is an implementation detail that should not be of concern to the admin. Behind the scene, you could first try a move, and if that fails try a recursive copy & delete, and if that fails, raise an error.

-- PeterThoeny - 2012-12-21

What do you mean by the admin? Assuming it's a person in charge of a TWiki installation, TWiki admins don't need to care about canmoveto and cancopyto. Those are only on "More actions ..." pages.

I thought a reliable cross-disk topic move feature might eliminate the need for the distinction between canmoveto and cancopyto. It might even eliminate the need for multiple disk awareness of TWiki. But it turned out that's not the case. If TWiki uses multiple disks, both canmoveto and cancopyto are needed. So is TWiki's awareness of multiple disks. Here's why.

There can be a topic having hundred or more attachments. If you move such a topic to a different disk naively, the chance for that to fail in the middle is unignorable. So you need to do it atomically as follows.

  1. Copy the attachment files ordinarily
  2. Copy the RCS file of the topic ordinarily
  3. Copy the topic *.txt file to a temporary file
  4. Delete the temporary file to the original file name (at this point the copied topic is recognized by others)
  5. Delete the original topic *.txt file (at this point the original topic is recognized as deleted)
  6. Delete the original topic RCS file
  7. Remove the topic attachment files and directory
This is not as bad as I thought. The steps 1 through 3 are worth considering as the way to copy a topic regardless of multiple disk support.

To cope with half baked copy and cross disk topic move, the following steps need to be taken periodically.

  1. Delete *.txt temporary files
  2. Delete orphan topic RCS files. Orphan means not having the corresponding *.txt file.
  3. Delete orphan attachment files and directories. Orphan means not having the corresponding topic *.txt file.
Again, if we have the improved topic copying as mentioned further above, the above clean-up makes it robust.

Still, TWiki needs to be well aware of multiple disks. If TWiki is not aware of multiple disks and symbolic links in the data and pub directories are managed outside of TWiki, TWiki cannot delete a web completely. That outside-TWiki mechanism needs to know about the symbolic links anyway.

Moving a topic to a different disk may be manageable. But moving a web or subweb to a different disk may take very long -- a web may occupy dozens of gigabytes. For web moving, the list of writable webs on the same disk as the current web (canmoveto) is needed anyway. Meanwhile, a copy destination does not need to be on the same disk (cancopyto).

There is another reason for the need for multiple disk awareness by TWiki - web deletion. When deleting a web (may have multiple gigabytes of data), it needs to be moved to a trash web. Copying that much and then delete takes too long. Moreover, Copying that much to the trash of a different disk may choke the disk while moving to the trash on the same disk doesn't cause disk capacity issue. Consequently, each disk having TWiki data needs to have a trash web.

-- HideyoImazu - 2012-12-21

Good arguments.

-- PeterThoeny - 2012-12-28

Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r9 - 2013-02-18 - HideyoImazu
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.