DistributedTWiki < Codev

Tags: view all tags

A distributed TWiki that consisted just of offline clients would be great. The central WWW server required to run TWiki is more often than not a pain in the neck:

It's a central point of failure. If the server is sluggish or down, then the entire TWiki will suffer. (This has been an issue with SourceForge in the past. SourceForge is just too damn successful.)
The WWW server may not have the CPAN modules installed that you need. Or it may have a braindead Perl implementation. Or it may have an unreasonably low allowance for the time that a Perl script may run. Or one of the dozen other restrictions.
Worst of all, the WWW server may not give root privilege to hard-working TWiki adminstrators!

The WWW server also offers some advantages:

It provides a single synchronization point. The topics on the server are the definite last word on what should be considered a valid, up-to-date contents.
A server is by convention considered reliable. Clients of a distributed TWiki may drop out of the net without a notice, or they may have a mail server that sometimes drops a message.

It is possible to emulate these advantages in a network of unreliable offline clients, but it does require some careful state tracking for change notifications. I will devise and post an algorithm for that if there's interest in it.

-- JoachimDurchholz - 03 Nov 2000

At the company I work at, we're currently setting this up. Currently we have Twiki servers in 3 Geos - One which would normally be called the Master Site (Since it's the head office), and 2 in outlying offices. The actual project through was started in the outlying offices for various reasons, and contains alot of content largely because of unreliable international links.

Given connections to remote offices can be unreliable, and the person leading the project is in a remote office, the idea of having a centralised master site, and MirrorSites is a nasty/icky one. (Though we are going down that route as an initial route.)

The long terms goal is to allow local editting of all pages, no matter where they originated, and deal with clashes in the following manner:

Keep the Twiki code base essentially as is.
Treat the data twiki edits as essentially a source tree.
Treat all files in the pub directories as binary.
Nightly (whatever the local geo calls nightly) lock out all access to updating the local twiki server. (This is pretty important really)
force RCS locks to be released on all pages, and then relock as (say) CVSSync.

BLA

Then store a copy of all the ,v files
Remove the ,v files.
Move back in the CVS directories that were there the previous night.
Perform an update/checkin, making notes on clashes this causes - moving the local clashed files to a known location.
Move the CVS directories back out to the "safe storage" zone.
move the RCS ,v files back from safe storage to where they were beforehand.

FOO

Do an rcs lock release/check in.
Unlock the twiki tree.
Work through the twiki .changes file matching the clashed files against the Twiki name of the last editor. Find their email address from their registered user page. (or some other means if available)

It's relatively involved, and awkward to an extent, and definitely brute force. The one thing it doesn't reqiuire though is modifications to the existing codebase, except for sensible DataAndCodeSeparation. Finally it also offloads the work of integrating the fixes to the clashes to the person who it really matters to - the person who's edit got clashed locally.

Due to latencies, and not wanting to stress the network too much, we're also considering using rsync to move data to a central checkin location at point BLA, and rsync to move it back again at point FOO, but that's just an optimisation.

Points:

Brute Force, simple, and would work.
Doesn't involve any changes to the existing twiki framework - simple bolt on. (No need to add things to prevent people editting a MirrorSite )
Allows people to use other methods for syncing as well as using CVS/etc
Allows everyone to perform local edits - this is critical in the following situations:
- Twiki Web used in multiple locations heavily linked by slow/unreliable/only occasionally up links.
- OfflineReadWriteWiki
Requires good DataAndCodeSeparation - which we have implemented locally and works well.

-- TWikiGuest - 13 Jul 2001

Um, not relying on CVS/etc. isn't going to buy us much, TWiki is using RCS anyway. (BTW you can use RCS to do anything that CVS does. CVS just remaps a few RCS concepts so that they work better in a project environment; CVS gives no real advantages over RCS if RCS is used from software.) -- JoachimDurchholz - 13 Jul 2001

We already have an installed CVS repository, so reusing that internally is appealing.
Simple - I can pretty much see how to do this now - rather than spend alot of time on it - something I don't really have a lot of anyway.
No one method is going to be suitable for everyone, so allowing for differing models by requiring them not to touch too closely on the main code base makes more distributed twiki models available. (After all some people may require all the Twiki's to have the same visible content - which implies a need for a two phase commit type thing) [Main.TWikiGuest 14/Jul/2001]

I haven't been as active for TWiki as I intended, so everything isn't as far as I planned. Here's the model I have been working on:

A TWiki network is a set of TWiki sites that exchange update messages.

The sites register each other; a site registered with another will receive updates.

Whenever an update is received, it's placed in an RCS branch. RCS branch numbers are assigned to update senders. This nicely keeps track of what was changed where and when.

An update "potentially conflicts" if it starts from from a different version than another update. TWiki first tries to merge every potentially conflicting update into the other branch; if that works, all is OK, otherwise the updates "conflict".
There are various things that can be done to help the merge process.

Have the "edit" module insert a line break after every full stop. (Adding gratuituous line breaks does't hurt the merge process.)
Use WordDiff.
End-of-file extensions are a common source of conflicts. If both changes have such an extension, take the later one and append it to the other file, this will eliminate the conflict (the conflict is essentially that RCS doesn't know which of the two extensions should go last, and the append will give it enough information to decide this.)

For a conflicting update, the site compares time stamps and considers the site with the latest time stamp "responsible". It sends an update to that site if it hasn't already. The author of the conflicting change is expected to manually merge the conflicting changes and enter this as the new input; this will create an update message that is mergable for the other site, the conflicting change is discarded.

The site could try various things to notify the author (send a mail, add a notice to his personal page, whatever), who's then obliged to do a manual merge. If the author fails to do the merge within a given time, his change is undone, though his changes are placed in his personal page to avoid data loss.

Points:

No constraints on what's in which directory.
Complex.
Uses no tools that aren't already used by TWiki.
No protocol for lost update messages. IOW webs will stay inconsistent until the topic is changed again.
Transport of update messages is easily separable from generation and processing them. I.e. it would be simple to allow for email, direct IP, FTP, shared directories as exchange media (helpful since laptops and/or corporate policy don't always allow every setup).
No fixed schedule for data exchange required. The exchange schedule can be adapted to the transport layer's demands.
Allows everybody to do local edits, which is important for the reasons listed above.
Allows everybody to do local edits, which will compromise access privileges. A site should be able to own webs and pages and not accept outside updates to them. It should also be possible to transfer ownership. (Possibly a %PAGEOWNER%=Site-Name variable in Web Preferences or in topics. If the owner changes this, the transfer of ownership will automatically be distributed to the new owner when he gets the topic update.)
It's possible to "break into" a TWiki network by sending updates. IOW the update messages should be secured by a cryptographic hash, or even encrypted. I don't know enough about net security to do a design for this off the top of my head, though I think I could make something up given enough time. (IOW it's low on my priority list and won't be in the first versions of this.)
Depending on what site registers where, you get a different distribution topology. If every site registers everywhere, messages will be sent asynchronously. If all sites register with just one "more central" site, you get a hierarchy; if updates are collected and sent (say) on a daily basis, as a single message, this generates less traffic. (For a typical corporate network, I'd suggest a handful of "backbone sites" that register with each other to eliminate single points of failure, have two or three update paths to every site where the laptop users log in, and let each laptop register just with the server(s) that he's connecting to.)

-- JoachimDurchholz - 13 Jul 2001

Observation

On the Twiki Codev site at the moment we've got a moderate number of related concepts:

DistributedTWiki , OfflineWiki (ReadOnlyOfflineWiki , ReadWriteOfflineWiki ) , WikiClusters , WebsitePublishing , NonWebAccessToWiki , MirrorSites

To my mind, these are all flip sides of the same thing - the desire to take Twiki Content away from a one server setup. (Note, I'm not talking about TWikiWithCVS - that's just one possible means to achieve some of these, though requires a large number of modifications to the main code base...)

Proposal

Support all the above by requiring the following goals:

Any solution should try not to preclude other solutions.
Each solution should try not to touch the central code base as far as possible.
If the central code base does need to change, it should be done in as generic a way as practicable to make the new hooks useful to other solutions in the same area. (Ala the rendering plugins that already exist)

Things that help with thes goals are the TwikiModularisation Stuff and better DataAndCodeSeparation.

Forces

Forces for different models of the Non One Server Model:

Need to Write at every location ?
- No: WebsitePublishing , MirrorSites , ReadOnlyOfflineWiki
  - Need not to rely on a central webserver?
  - Requirements for offline content to be structured for use in other formats ?
- Yes : SatelliteSites , ReadWriteOfflineWiki , DistributedTWiki
  - Always on connevtivity ?
    - Yes : Allows for synchronoues updates.
    - No : Requires a means for re-synchronising content.
  - Centralised or distributed content ?
    - Centralised - allow remote edits ?
    - Distributed - Mirror content, or syncronous updates, asynchronous updates.
- Debateable : SatelliteSites , WebsitePublishing , WikiClusters
Data allowed to get out of sync at different locations ?
Need for centralised repository/reference site ?

Others ?

The message passing approach, central repository approach, mirrors, and publishing all achieve similar goals but with different benefits... Hence the proposals above... Thoughts?

-- TWikiGuest - 14 Jul 2001

BasicForm
TopicClassification	BrainstormingIdea
TopicSummary
InterestedParties
RelatedTopics	CategoryReplication, ReplicationTechnologiesForTWiki

Topic revision: r10 - 2006-01-02 - PeterThoeny

Account
- Log In
- Register User

Edit
Attach

Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.