Feature Proposal: WEBLIST canmoveto and cancopyto
Motivation
On a TWiki site having thousands of webs,
%WEBLIST{...}%
may take too long. On the rename/move page,
%WEBLIST{...}%
needs to be used to show the list of destination web options.
CopyScript is going to introduce a similar page which also has
%WEBLIST{...}%
for the similar purpose.
In both cases
%WEBLIST{...}%
needs to return within several seconds at most even if the site has thousands of top level web and subwebs.
In addition to a large number of webs,
ReadOnlyAndMirrorWebs,
UsingMultipleDisks,
CopyScript are concerned.
- You cannot move or copy a topic to a slave or read-only web (ReadOnlyAndMirrorWebs)
- You cannot move a topic to a web residing on a different disk. Copying a topic to a web residing on a different disk is fine (UsingMultipleDisks)
Description and Documentation
What's implemented already
This is already partially incorporated into the core in the trunk -- With the rename page,
%WEBLIST{webs="...,canmoveto" ...}%
is used in place of
%WEBLIST{webs="...,public" ...}%
on
templates/*.tmpl
. And
TWiki::WEBLIST()
and
TWiki::Store::getListOfWebs()
are enhanced accordingly.
As such, this proposal is raised for clarity and transparency in this regard.
To reduce the time to take,
canmoveto
list is made in the following steps if
RepositoryForSiteAndWebMetadata is in action and all webs are required to be registered, which should be the case if you have thousands of webs.
- Obtain the list of top level webs from the metadata repository instead of traversing the {DataDir}
- With the current web, subwebs of the top level webs of the current web are added to the web list
- If the user has their own personal subweb, that's added to the web list
- Webs not writable and residing on a different disk from the current web are eliminated
For
CopyScript, a further enhancement of the similar nature is needed.
As mentioned at
Motivation, the copy page's destination web options are different from the rename page's.
A copy destination needs to be writable (not read-only or slave) but can be on a different disk from the source.
As such, the page template of the
CopyScript, namely
copy.tmpl
and
copy.SKIN.tmpl
, shall have
%WEBLIST{webs="...,cancopyto" ...}%
for destination web options.
This is achieved by enhancing TWiki::WEBLIST() and TWiki::Store::getListOfWebs().
Excluding webs from WEBLIST result
You may want to exclude some webs in the
canmoveto
and
cancopyto
web lists.
You may rotate Trash web periodically -- creating a new Trash after renaming Trash to Trash1 after renaming Trash1 to Trash2 ... after renaming Trash9 to Trash10 after deleting Trash10.
In that case, it doesn't make sense to move a topic to Trash1, Trash2, ... while moving to Trash is fine.
With copying, there is no value in copying to Trash or rotated Trash.
If you are
UsingMultipleDisks, each disk has its trash such as Trashx1x and Trashx2x.
And those trash webs may be rotated too.
Those shall be achieve by the following configuration parameters.
$TWiki::cfg{WEBLIST}{canmovetoExclude} = qr/^$TWiki::cfg{TrashWebName}(x\d+x)?\d+$/;
$TWiki::cfg{WEBLIST}{cancopytoExclude} = qr/^$TWiki::cfg{TrashWebName}(x\d+x)?\d*$/;
Examples
Impact
Implementation
--
Contributors: HideyoImazu - 2012-12-21
Discussion
Possibly better to not make a distinction between copy and move? This is an implementation detail that should not be of concern to the admin. Behind the scene, you could first try a move, and if that fails try a recursive copy & delete, and if that fails, raise an error.
--
PeterThoeny - 2012-12-21
What do you mean by the admin?
Assuming it's a person in charge of a TWiki installation, TWiki admins don't need to care about
canmoveto
and
cancopyto
.
Those are only on "More actions ..." pages.
I thought a reliable cross-disk topic move feature might eliminate the need for the distinction between
canmoveto
and
cancopyto
.
It might even eliminate the need for multiple disk awareness of TWiki.
But it turned out that's not the case.
If TWiki uses multiple disks, both
canmoveto
and
cancopyto
are needed.
So is TWiki's awareness of multiple disks.
Here's why.
There can be a topic having hundred or more attachments.
If you move such a topic to a different disk naively, the chance for that to fail in the middle is unignorable.
So you need to do it atomically as follows.
- Copy the attachment files ordinarily
- Copy the RCS file of the topic ordinarily
- Copy the topic *.txt file to a temporary file
- Delete the temporary file to the original file name (at this point the copied topic is recognized by others)
- Delete the original topic *.txt file (at this point the original topic is recognized as deleted)
- Delete the original topic RCS file
- Remove the topic attachment files and directory
This is not as bad as I thought. The steps 1 through 3 are worth considering as the way to copy a topic regardless of multiple disk support.
To cope with half baked copy and cross disk topic move, the following steps need to be taken periodically.
- Delete *.txt temporary files
- Delete orphan topic RCS files. Orphan means not having the corresponding *.txt file.
- Delete orphan attachment files and directories. Orphan means not having the corresponding topic *.txt file.
Again, if we have the improved topic copying as mentioned further above, the above clean-up makes it robust.
Still, TWiki needs to be well aware of multiple disks.
If TWiki is not aware of multiple disks and symbolic links in the data and pub directories are managed outside of TWiki, TWiki cannot delete a web completely. That outside-TWiki mechanism needs to know about the symbolic links anyway.
Moving a topic to a different disk may be manageable.
But moving a web or subweb to a different disk may take very long -- a web may occupy dozens of gigabytes.
For web moving, the list of writable webs on the same disk as the current web (
canmoveto
) is needed anyway.
Meanwhile, a copy destination does not need to be on the same disk (
cancopyto
).
There is another reason for the need for multiple disk awareness by TWiki - web deletion.
When deleting a web (may have multiple gigabytes of data), it needs to be moved to a trash web.
Copying that much and then delete takes too long. Moreover, Copying that much to the trash of a different disk may choke the disk while moving to the trash on the same disk doesn't cause disk capacity issue.
Consequently, each disk having TWiki data needs to have a trash web.
--
HideyoImazu - 2012-12-21
Good arguments.
--
PeterThoeny - 2012-12-28