Tags:
replication2Add my vote for this tag create new tag
view all tags

Read-Write Offline Wiki

A read write offline Wiki is for people in the field who have a need to change content while offline. (Topic started in OfflineWiki)

  • Pros:
    • Can edit and search content while offline.
    • Webs shared this way cannot be censored or controlled by an single individual or small group of individuals without significant implementation overhead. MS
    • Can be implemented as a bolt-on, and "grown into", rather then increasing initial install hurdle MS
  • Cons/catch:
    • Setup issues: Web server and TWiki needs to be installed on client.
    • Intelligent merge is necessary, like for example a TWikiWithCVS.
    • Webs shared in this way lose the ability to perform access control, unless a heavyweight, tightly integrated approach is taken MS
  • How:
    • Work on content independently on different TWiki installations.
    • Synchronize periodically.

Related: OfflineWiki, ReadOnlyOfflineWiki, WikiClusters, WebsitePublishing, NonWebAccessToWiki, TWikiWithCVS, TWikiXML, TWikiWithClearCase, ReplicationTechnologiesForTWiki, ReplicationUsingUnison, TWikiForWindowsPersonal

-- PeterThoeny - 23 May 2000

Setup issues: A web server on the client could be avoided if TWiki is (optionally) extended with a primitive httpd server mechanism that talks only to 127.0.0.1 . Anybody knows GPLed source we could use?

DavidChess send me these URLs of Perl based web servers: -- PeterThoeny - 07 Jul 2000

Merging issue: TWikiWithCVS is not necessarily needed, just the funtionality of CVS. Or, it could be done in a primitive way by simply showing the diffs in case there is a conflict, and then let the user merge the content manually with a cut&paste operation.

-- PeterThoeny - 06 Jul 2000

No "primitive merge" is required; RCS has exactly the same merge functionality as CVS. (In fact CVS uses RCS merge. CVS is "just" a framework for managing RCS revisions; the actual file manipulation is done entirely in RCS.)

There may still be conflicts that the automatic merge cannot resolve, so a manual merge facility is still needed.

We might want to add some special intelligence for common conflict cases. Off the top of my head, if two people add stuff independently to the end of a topic then there will be a conflict because RCS cannot know in which order the additions should be appended. In this case, we could simply sort the new sections according to edit date. This could be achieved by appending the signature lines in proper order and then reapplying both changes (the signatures have a date, so it's easy enough to sort them, and they should be enough to guide RCS - experimentation is needed though).

-- JoachimDurchholz - 22 Nov 2000

Hi Peter, I have written a small perl hack to GetAWebAddOn (see) as a zipped/tar.gz/tar.bz2 file, containing data, templates and attachments of the web of choice.

I have not addressed the merge requirement above, so I am not sure if this note should'nt moved to ReadOnlyOfflineWiki.

-- AndreaSterbini - 15 Aug 2000

Here's my list of ideas on the issue. This is all written strictly from my personal perspective, with no attempt at generalizing the design for other setups; I just hope that my setup is common enough to warrant mindspace smile

My setup is the following:
I'm connected to the Internet with a dial-up line. I'm offline most of the time; when I wish to exchange data with a server, I go online, initiate the transfers, and go offline again. (This setup is very common. Connection fees are extremely high in Europe, at least in comparison to US rates.)
I'm also one of those Windows-challenged guys. This enforces some peculiarities (setting up a WWW server or a Perl interpreter is a major task, for example).

My need is the following:
I have a project with a TWiki hosted on SourceForge (actually it could be any WWW server with a Perl engine). As I'm mostly living offline, and transmission times are at a premium, I want to connect once per day, upload all changes that I have made, download all changes that others have made, and disconnect, leaving any conflicts to be resolved offline.

This requires mechanisms for the following areas:

Identify changes in the WWW server.
This is already done (the email notifications).

Identify local changes.
I'm doing an explicit connection build-up anyway, so it would be reasonable to require that I click on a "wrap my local changes up" button that ran a script over all TWiki topics and compared them with previous versions.
This could be optimized: Let the edit script take a note of which topics were edited, and have the change wrap-up script touch just these topics.

Extract changes so they can be merged later.
This one is simple. Either manage the stuff locally using RCS (this requires a working local RCS installation; that's not too difficult even under a Windows installation), or just have the edit script keep a copy of the original file around somewhere and let the wrap-up script do a diff. (Installing diff essentially means installing the core or RCS, so the difference isn't that large as it might seem on first sight. In particular, we'll need diff3 locally later, and that is most definitely a part of RCS.)

Transmit change files to offline clients.
There are many options for this, each with advantages and disadvantages.
Email: Advantage is that everybody has it. It also offers the intriguing possibility to run a DistributedTWiki without a central WWW server: every offline client just sends the change emails to all participants. Disadvantage is that getting the change mails out of the mailbox can be a challenge: every mail client has different ideas about how mails should be stored locally. Another disadvantage is that mail is unreliable, so the offline client must have a way to detect lost mails and to request a resend.
Email (2): Tell users to set an email account aside for change emails, and to not tell their mail user agent about it. Use Mail::POP3 or Mail::SMTP to retrieve the change emails. Advantages: Everybody has email. Setting up a POP3 account on GMX or AOL is dead simple and doesn't even cost money. Disadvantage: Retrieving the mail requires that the offline client get along with the idiosyncrasies of the various mail servers in the world. Some mailers use POP, others SMTP. Some mail servers append advertisements to every message. Other than that, this offers the same advantages and disadvantages as the first Email variant. FTP: This protocol generally isn't very suitable for distributing changes (been there, done that: it's really clunky). There's no easy backchannel to tell the server to resend a change. The server needs a good naming scheme for files waiting to be retrieved. And the server never knows when it can delete a file. The other disadvantage is that it definitely needs a server. HTTP: Communicate changes with the server via HTTP GET and POST requests. Advantage: No need to access protocols that the TWiki scripts don't need to use anyway (keeping the maintenance burden down). Disadvantage: Requires a central WWW server.

Merging changes.
This is simple: Just call the appropriate RCS or Patch command.
Applying a merge may result in a conflict; in that case, no unsupervised merge is possible.

Handling conflicts.
The best strategy that I can think of is the one taken by CVS: don't try anything fancy, just tell the author that the merge failed due to a conflict, and send him an update of the file. The user can then reapply his changes to the updated file and send that.
Note that RCS can automatically resolve merge conflicts that apply to different parts of a topic. This works well enough in practice; if people want to see what came of such a merge they can always look at an RCS diff to see what was done.

-- JoachimDurchholz - 03 Nov 2000

Being the completely the other way oriented - in no way Windows Challenged (ie Unix on my laptop with webserver, CVS, RCS etc, and the same on a desk machine at work.), I'm taking the other approach - A central repository of text controlled by CVS, editted using a Text Editor called "TWiKI" (IYSWIM smile which runs pretty much as normal. ie Many editors, each with local RCS locking/control, whom periodically update/merge their changes with the central text repository. The clear win here is if the central repository dies, the text does not die with it. It also means there's very little code I have to write.

Since I'm adding functionality (for my convenience !) to publish a TWiKi web, and also to allow me to intelligently add email messages with auto thread/topic detection (beyond the normal header schemes - the aim is automatic folder creation rather than manual procmail style rules), I'm doing it my way for my convenience smile

I suspect when a few people have implemented something that works right for them, merging the ideas that work in practice, will result in something much better than a theoretical "I like doing it this way"... Anyhow, what I'm implementing:

Distributed/Roving TWiKi.

Twiki is great for a single server installation. In a multiple server installation (eg as a notetaking tool on multiple laptops) syncing with a central server, it's pretty naff. Idea decided upon:

  • Cental CVS repository of documents.
  • Each TWiKi checks out it's own copy. Moves CVS to a different directory, and creates an RCS directory in it's place.
  • DON'T need to check in the docs to the RCS. They will be checked in by TWiKI correctly.
  • Then mv RCS to somewhere else
  • Then mv CVS back
  • Then commit changes.

Theory of operation:

  • Documents are stored on a central repository.
  • Every TWiKi server is a client of this repository. No master of the repository. It should be possible to make one TWiKi a master TWiKi though with this logically being the "maintainer" of the respository. Therefore a TWiKi web simply becomes something stored in the repository, essentially not changing the semantics of TWiKi's code. (No huge change to code - allows incremental transition.)

  • Therefore each server can make modifications to it's hearts content. As normal only one user can be editting a document on a single TWiKi at a time using TWiKi's normal RCS locking.
  • Periodically the TWiKi admin (or the server itself) performs updates to the central trees, informing the TWiKi admin of conflicts to be resolved.

The key points to deal with are:

  • Conflict editting.
  • Updates - automatic/manual.
  • Change control notes. (Merging of modification data... - can simply log...)
  • CVS Aware TWiKi needs to exist in order to perform online roll backs. (This can be done by the TWiKi attached to the central repository, rather than roving Twiki's)

RCS Twikis then become online roving editor applications for editting documents grabbed via CVS. CVS Twiki (nb I don't necessarily mean what the topic on CVS TWiKi means by a CVS TWiKi!) is an editting/publishing point - in the short term, this isn't a vital thing for me... (But will be needed)

Editting for approval/content update. Publishing for the ability to modify things online.

Required extra features:

  • Ability to do access control ala Unix file system.
(ie access control on a per topic basis much like the UFS - preferably via an access lists approach rather than just allow people to edit everything in a web - enforcing control using Apache is possible, but currently very messy.)

-- TWikiGuest - 21 Nov 2000)

There's one downside to this approach: It requires a central administrator who's willing and able to resolve conflicts. This is fine if the admin is (a) truly dedicated and willing to serve until the TWiki is taken down and (b) able and willing to resolve conflicts in a manner that none of the original authors has reasons to object.

If the participants are sloppy at exchanging their data, the central administrator will have lots of conflicts to resolve. IOW this is a scenario of "if I'm sloppy it won't hurt me", which tends to end in abuse.

A fully decentral TWiki doesn't have this. If I'm sloppy and don't exchange my data, I risk getting into a conflict and having to resolve it. If getting connected is truly difficult (such as being mobile on another continent for a while or such), then I have more options: I can strike a personal balance between the hassle of setting up a network and resolving the accumulated conflicts.

-- JoachimDurchholz - 21 Nov 2000

There might be another way around the problems of off-line working, that is making support of merges simple and straightforward.

Here is my basic idea. Introduce a MergeLink that is rendered slightly different from normal links (for example followed by an exclamation mark). Such a MergeLink! indicates somebody is going to change here something. In a (temporary) topic-page there is some (tagged) intentional information like a regular expression characterizing a small fragment of text before the MergeLink. A preview highlights this fragment in the page to be changed.

Then one downloads the page to be changed together with its temporary change-page. The replacement is now edited off-line in the temporary page.

The on-line editor has 2 small windows in stead of 1. Being online again one does a copy/paste of the temporary page in the second window and pushes a new merge button. One can preview the result, do a little tuning if needed, and then confirm the changes after which the tempory page is removed and the original page is updated.

The merge-algorithm first looks for the associated MergeLink, then controls if the characterizing regular expr is still valid for the neighbourhood above the MergeLink. In case of failure it is up to the user what to do.

The next step is to let the mechanisme also work in case you want to make changes off-line that you did not indicate before with a MergeLink. It seems reasonable to assume that one has off-line (a subset of) the same editing, merge and preview facilities as on-line. Then the off-line and on-line procedure can be made quite symmetrical. First add a MergeLink to the page to be changed, then edit the tempory page, and control the preview. Going back on-line, the algorithm uses now only the regular expression for fragment resolution because the MergeLink is absent in the original one. If resolution fails one has to inspect why, and take manual action like skip the change, tune the regular expr or add the missing MergeLink for explicit resolution.

Well, I hope this explanation is clear enough and that the approach makes sense as a step forwards into enabling an easy way to support off-line working.

-- TheoDeRidder - 20 Nov 2000

Hmm... if I understand this correctly, you're proposing a way to annotate changes.

This is not necessary!

RCS is well-equipped to detect changes without assistance. Just let RCS run over two different texts and tell you where the changes are. No need to write a MergeLink, no need to design a regular expression (which would be beyond the abilities of many people anyway, including myself).

It might be interesting to outline changed-but-not-yet-accepted text; this could stand in for the visual aspect of the MergeLink (if I understand that idea correctly, that is!). E.g. everything that's changed between the last known server revision and the local copy could be displayed as dark green instead of black. (Here's another opportunity for a color configuration <grin>).

-- JoachimDurchholz - 21 Nov 2000

No, not changes themselves but only a temporary annotation of the 'intension' of a change. You know you are going to change something, but have to think about it (off-line in the sunshine). You indicate the involved area (I prefer direct colouring, but thought that is not general available, so suggested a regular expr (but just a number of lines it presumable good enough)). The indication does help others to take a little care about the future, and yourself with merging in later on.

Or, another way of saying the same thing is: a mechanism for outlining the area that should have been untouched at the moment of merging. Then the term MergeLink maybe better replaced by something like MyContext.

So, while RCS is indeed taking full care of the past, my little idea is about some preparation for a future change.

-- TheoDeRidder - 22 Nov 2000

This is called "pessimistic locking" in a revision control context: the MyContext marker essentially locks the stretch of text against changes by other people until you have made your changes.

Pessimistic locking has serious disadvantages:

  1. It's too easy to place a lock and forget removing it. In short time, all topics will be riddled with locks, and whenever you want to change a text you'll have to ask the lock holder to remove it. (The worst-case scenario is a drop-out who doesn't have the interest, time, or ability to remove the locks anymore.)
  2. You don't usually know in advance what you're going to change. At least not exactly enough to place the locks exactly where you're going to apply your changes. This means that people will have a tendency to lock more text than they actually need (and usually with a bad conscience, which means that the TWiki experience will be less fun).
  3. Whenever you have a change in mind, you'll have to take note of the change (to avoid forgetting about details), apply the lock(s), wait until they are confirmed, apply the change, and check the stuff back in. This is a very tedious process that's making the TWiki experience even less fun.
  4. If you apply multiple locks, two people can accidentally lock each other out. All that's required is that Ann applies a lock on text A, and days later decides that she also needs text B to complete her change. Unfortunately, Barbara has decided to lock B in the meanwhile, only to discover that she needs A as well. Now both will find that the text that they need is already locked, and wait indefinitely for the other to release the lock so they can apply the change. Well, dumb computers would go into an infinite wait (deadlock), humans will probably let the issue drift into oblivion, maybe negotiate to get the change through, or just quit the TWiki.
  5. And, last, it adds administrative overhead even if there's never a conflict.

It is possible to work around these problems. E.g. one can add a mechanism for storing a change and have it automatically carried through once the lock is approved, for example... but that's already achieved by the current resolve-conflicts-when-they-occur strategy, the only difference is that the change will be stored in the topic instead of in a separate file (and it's in the topic where I usually want to see the changes - if I work with TWiki, I don't want to bother about who's got my version unless there's really a conflict).

-- JoachimDurchholz - 22 Nov 2000

Let me have another go at conflict resolution. If I have a fully decentral offline TWiki, the worst conceivable situation is a "net split": The network is partitioned into two subnets that don't have any communication for a while.

While the split is in effect there's no problem: everybody makes modifications, and they're checked into RCS and (somehow) internally redistributed.

In the moment when the subnets are rejoined, the topic changes that have accumulated during the split must be merged. I.e. I have a timeline like this (assume all changes would be in conflict, nonconflicting changes are automatically merged by RCS and are thus uninteresting):


Revision history  Event log
      1.1         Topic created
       |
      1.2         Topic changed
       |
      / \         Communication between subnets ceased
     /   \
   1.3A   |       Some change in subnet A
    |     |
    |    1.3B     Some change in subnet B
    |     |
     \   /
      \ /
       |          Communication reestablished
       |          1.3B revision is sent to A subnet

What happens next?

First, the A subnet repository must know what the revision was off which the 1.5B version was based on, so this information must be transmitted together with the change itself. (I'm not sure whether TWiki does this already, but it would be a good idea if it doesn't: it prevents mixups if two people start editing the same topic simultaneously.)

Now the A subnet knows what was done in B. It does a three-way diff: 1.4A vs. 1.2 vs. 1.5B, and sees the conflicts. (If it doesn't see a conflit it's done: it just merges the changes. It should also send its 1.2-to-1.4A delta to the B subnet, which will be able to merge that change without a conflict as well, and all's done.)

The conflict must be resolved, but since all automated conflict resolution has failed, the issue must be presented to a human. The human will see a conflict warning and get a request to resolve the conflicts.

There are two human candidates to send the request to: the one responsible for 1.3A and the one for 1.3B. I'd say the request should go to 1.3B; after all, 1.3A was done earlier, and it's more likely that he has forgotten about the details of his change. Besides, I like to award faster over slower changes; loading a page and keeping it open for editing for days shouldn't become a tactics to avoid pesky conflict change requests.

After resolution, the result is sent back to A, and we have a new common 1.4 version established in both subnets.

What if there are multiple changes, with potentially many interleaving changed in both revisions? It would be possible to reconstruct a series of interleaving changes and have the author of the later change redo it in a merge, but this would be too much work. Often it's easier to redo a bunch of changes in one go instead of painstakingly retracing each change.

So if there's a revision history like this:

    1.1
    1.2
 1.3A
 1.4A
      1.3B
      1.4B
 1.5A

then the author of 1.5A will get a message with the deltas of 1.2->1.5A and 1.2->1.4B, with a request to merge the changes into a new, common 1.6 revision.

What if the 1.4A author is lazy (or ill or was run over by a bus) and ignores the request? Assume somebody has a 1.4B version on his machine and adds a change; in the moment that his new 1.5B change is distributed, a conflict will arise and the author of 1.5B will get a conflict resolution request.

Hmm... when reviewing my changes, I looked at the top of this topic and found that Peter had exactly this in mind in 6 Jul 2000. IOW we're back full circle, with the additional twist that this merge isn't necessarily restricted to a scenario with a central server, it should work reasonably well if all TWikis are offline. (Everybody should have an RCS installed though.)

BTW all this merging and branching can be mapped directly to RCS. RCS has facilities to store and retrieve independent branches of change for the same document, and it has commands to automatically merge branches. It will just complain if there's a conflict during merge, asking the user to add enough changes to the head revisions of any of both changes to make a merge possible. Sounds familiar...

-- JoachimDurchholz - 22 Nov 2000

Of course, if my idea was really just 'pessimistic locking' your remarks about the complications and a non-twikilian moral are quite right.

Well, let me try to give an example to illustrate my rationale.

A group of songwriters is working on a document containing songs. A typical aspect of a song is that any change can unbalance the whole song, therefor the absence of 'technical' conflicts does not mean the absence of poetic or melodious conflicts. So a songwriter working on a song indicates he is working on it. That does not mean a lock, the others may edit it just in the usual twikilian way. It is only offering a type of awareness, not forcing behaviour. With awareness a partner could for example make a choice in what order to make his changes (if # > 1) to enable a better workflow in the cooperative writing.

At the other side (of the merger) the indication helps as a precondition for an automatic merge, only when the precondition failes the merger has to inspect manually if the song is still balanced. Without indication the merger has always to inspect manually if any of the other changes since he checked out are touching the song.

If supporting a little bit of workflow is polluting the simple concept of TWiki, then my idea can also be applied only as a little help during merging (without any indication in an earlier stage). The hypothesis behind 'help' is that marking what should be invariant is less work than inspecting what might be changed and wrong (in particular in case of multiple changes).

One might say that each song should been a separate topic. But I think that each well-written piece of human language is full of small song-like fragments.

-- TheoDeRidder - 24 Nov 2000

Ah yes, now I understand. This should work. I'm not sure whether implementing it is worth the effort, but then one is never sure about new things.

However, if the change forewarnings are just advisory notes, I don't see how they help with offline editing and reconciling changes. Could you elaborate?

Re the song-like quality of human language: I fully agree, but the key point is "well-written". WikiWikis are used for quickly exchanging ideas and building consensus, and with that use, literary quality is not considered important enough to spend any effort in that area. (This doesn't mean that TWiki cannot be made into a form that attracts people with interests in cooperatively creating high-quality texts, so if you feel that your idea contributes, go ahead and implement it!)

-- JoachimDurchholz - 25 Nov 2000

I used a poetic metaphor to explain a technical mechanism for proactive help with semantic conflicts. The mechanism is: making the granularity of a possible merge conflict tunable by the author of a change. The granule (larger than the change itself) is related to the document state that an author sees just before his change. So indicating the granule can be done at any time the author sees the document in that state.

Part of helping to reconciliate conflicts by highlighting just the semantic delicate regions is also making more clear that your own change is clumsy, irrelevant or impossible in relation to what others did in the meantime.

A more down-to-digital-earth usage of the idea would be the domain of cooperative literate programming (once upon a time invented by Knuth). Not of course for exchanging tons of ugly Perl or C++ code. But maybe for something that is close to compact, readable (!) and (perhaps) executable specifications (I use Python for that purpose as my poetic vehicle). A typical granule-size for a change within a piece of Python-text would be a method of a class.

But, I do realise that making usage of TWiki practical for such a domain also requires an import/export facility to a specific syntax-directed editor.

There is an ironical paradox in saying: "... because TWiki is a tool for building consensus 'literary' quality is not considered important ...". The more a document represents consensus within a group of human actors, the more precise (balancing on the edges of the ambiguities of natural language) it will have to be, and the harder to make any change without introducing new (and increasingly) time-consuming 'conflicts'. Just look (as extreme) at the production-process of documents by politicians, lawyers or committees.

-- TheoDeRidder - 26 Nov 2000

Has anyone updated GetAWebAddOn? Thanks.

-- MartinCleaver - 27 Jun 2002

A few more comments to some older arguments from Joachim and Nicolas in TWikiWithCVS.

  1. "I still don't see how a CVS backend helps with release tagging" -- it should save you implementation effort, because all the looping and low-level work is done within CVS. And (unless there is RcsLite) it saves tons of fork()s.
  2. "CVS' replication facilities don't give TWiki a serious advantage" -- replicating CVS repositories is a pain; that's true. But I'm dreaming of distributed check-out copies. TWiki could serve, say, 95% of requests from these local copies. For the other 5% (query versions + diffs) you need CVS. And it handles the propagation of changes up to the central repository (which could be your existing CVS / file / non-TWiki server) and back down to the other distributed servers.
  3. "Using CVS for TWiki replication is non-trival" -- granted. Q: is there a trivial way to implemented custering, replication, offline usage?

-- PeterKlausner 8.Aug.2002

I agree with PeterKlausner's comments above and support his desire to support replication using a CM system. This issue has also been discussed in TWikiWithClearCase...

-- ThomasWeigert - 08 Aug 2002

What about using a backend that is smarter about doing distributed version control? I'm thinking of something like arch or BitKeeper. These guys have a lot of tools for allowing an isolated repository to make changes independent of the "main" repository and also sync them up.

-- DougAlcorn - 09 Aug 2002

I've got the method I outlined above up and running:

  • Use data and code separation
  • Check in the entire contents of each web's directory (using -kb for the contents of the pub subdirectory)
  • Use RCS directories rather than have ,v and .txt in the same directory - this avoids a number of problems.

Once checked in, and checked out on satellite site, the RCS directory on each local node means that people have safety of edits and no reliance on external change control (cvs ignores an RCS directory).

Then periodically satellite sites can do a cvs update; ci -l changed topics; cvs ci type cycle, picking up the changes from the other sites participating in that discussion. One very key point - no single wiki becomes a single point of failure - or control . (Essentially this brings a usenet like quality to the wiki)

Currently synchronisation is a manual process - largely to check out what problems arise. (Biggest expected problem is conflicts in metadata - conflicts in data are much easier to deal with - render differently, and then leave to the user to resolve. The wiki can even mail the user who made the last local edit telling them to resolve the conflict)

Still a number of issues to be dealt with, but that's the case with any new feature. (If wiki was a "replacement" for email - not sure I buy that TBH - then this works pretty well as a "replacement" for usenet.) If you weren't using data and code separation then this functionality would be significantly harder.

Finally got round to it smile I've wanted this feature for almost 4 years!

-- MS - 22 Jan 2004

Michael, this is fabulous - but you hedge around the conflict resolution problem a bit. I assume that's because you don't know yet what's in the can of worms. Conflict resolution scares me a bit; having used clearcase for years, I know the problems of merging conflicting updates.

Personally I'd prefer that conflicts were resolved by the person doing the latest checkin, and the online wiki identified clearly as the "master". Here's a possible user story:

  1. User1 goes on business trip, takes laptop with offline wiki
  2. User2 makes changes in main wiki while they are gone
  3. User1 also make guerilla changes to the same topics
  4. User1 and User2 change the same form fields in the same topics
  5. User1 returns home, plugs in and hits the "synchronize" button on their offline wiki
    1. they are presented with a page containing a list of topics containing conflicts. All these topics are locked in the online wiki.
    2. non conflicting changes are performed silently and the online and offline versions synchronised
    3. when they click on one of the conflicts, they are taken to a page that displays the online page version open for edit, and their offline version next to it. This gives the opportunity for them to merge their changes into the online version.
    4. if they ignore the conflicts, then the offline wiki is updated to reflect the online version i.e. their changes are lost.

-- CrawfordCurrie - 23 Jan 2004

I was brief because it was relatively late (and because the conflict problem isn't actually that bad) smile

The master wiki in this implementation (after all I've done read only distributed TWiki for a long while now) is actually the CVS repository. This doesn't "belong" to any of the wiki servers directly. It's likely to be hosted by one of them, but could equally just be a sourceforge or savannah repository. None of the wiki servers uses the CVS repository as it's local store. (ie no special TWiki::Store::CVS modules get written)

There's two kinds of conflicts that need consideration:

  1. Conflicts in topic text lines
  2. Conflicts in META lines

Scenario:

  • User is editting on their laptop (or independent server), makes changes
  • They (or their local admin, or cron) performs an update/checkin RCS/checkin CVS cycle.
  • This may leave them locally with conflicts. (Standard CVS issue)

CVS marks up the conflicts using the usual:

    <<<<<<
    foo
    ----
    bar
    >>>>>
    
type syntax.

In topictext
This can simply be handled by rendering the text differently until the conflict is resolved. A clash is a clash of edits - a difference of opinion. This means that both edits have equal weight. (Hence why CVS couldn't resolve the issue smile ) Either the person who was the source of the problem resolves the conflict, or they don't. If they don't, it's just two differing opinions that both get checked in next time. Unless conflicts in programming code, presenting conflicting arguments in text is positive.
  • Pages where it might cause problems are those with complex searches - pages part of an application, defining CSS in the topic and so on. The simplest solution there would IMO be take the same approach as META lines (below) and provide a META tag to tag pages as "conflict sensitive".
In META lines
This causes problems for the TWiki code. This means some work needs to be done. A simplest thing that can possibly work is to do this:
  1. Assume the master server is always correct.
    • Take the conflicting META lines that are already in the CVS repository as "correct" - use that to resolve the conflict
  2. Tag the page as a conflict needing resolution in metadata
  3. Place the conflicting metadata into storage (the topic text is simplest) and checkin.
  4. The ideal scenario here is to allow the existance of alternates - which then allows the topic to have multiple sets of metadata.

What this is likely to cause is the same situation that happens normally with CVS - the person who has to resolve the conflict is normally the one who performed the last edit. In either case the wiki server that discovers conflicts can find out who locally last editted the page, and mail them with a link to the conflict to resolve.

That might sound too simple, but in practice I'm pretty sure it is that simple. (As I say I'm doing this using a manual process at present until I work out the kinks)

Setting it up isn't really that difficult either. (Conversion from RCS files in the same directory as the text to RCS files in an RCS directory is the bit with the most faff in fact)

Consider the degnerate case - each TWiki is used by one user only. In that situation each TWiki is just a text editor for that user. The conflict resolution scenario is exactly the same as standard CVS. So far from not being aware what's in the can of worms I'm fully aware of what's in this well known can of worms you can get at any handy dandy sourceforge project wink (After all I'm mirror TWiki.org changes into "my" wiki codebase and get conflict resolution issues on a regular basis - as a result I'm pretty certain this approach will a) work b) prove fruitful)

Note for anyone confuddled : this isn't using CVS as a local store backend - it's using CVS as a synchronisation tool - allowing global histories of edits from different wiki servers to be stored in one place, and local histories of edits to be stored locally. All the wiki servers are created equal, with no "master" wiki. (What's the master server for usenet?)

(Corrollary: Anyone who puts their web into this shared environment has to release absolute control over the system)

-- MS - 23 Jan 2004

Great, that there is progress on this important feature, although I'm waiting only for 3 years smile Still, there are a few problems to solve:

  1. TWiki assumes, that the checked-out .txt and the latest repository revision are identical. If you fiddle with the .txt (as I often do), then revision + diff Display get inconsistent, confusing innocent users
  2. By synchronising only the current checked-out copy, you loose all history in the local TWiki's RCS
  3. Even if you don't loose a revision, you loose the precise time and author info
  4. Conflict resolution basically has to happen on shell level, not from within TWiki.
  5. When you fixed the conflicts under the hood, nobody will see this in the WebChanges, unless you reload+save the topic from TWiki. From each TWiki!
  6. A simple cvs ci or update transfers the whole tree, even if there are 0 changes! Depending on your Wiki size (those evil attachments, you know...) and bandwidth, this may be a problem.

I still guess, that it is important to "teach" TWiki the notion of a checked-out, work-in-progress revision not in the repository, as layed out in PageCheckoutCheckinStrategy. This would make it much easier to do conflict resolution,

Update: didn't see Michael's latest comment before save. Good point to explain the degenerate case of 1 TWiki per user. Still, I would really love to use CVS not only as synchronisation aid, but to integrate it into TWiki's revision history. Colas' RcsNonStrictLocking hack looks like it would be easy enough to change TWiki's basic behaviour on checked-out copies.

-- PeterKlausner - 23 Jan 2004

If you can think of a sensible way of dealing with checking in the specific version histories I'm interested in that. However you need to think of each edit location as essentially a branch, and each checkin as a merge. (Has version control separate store etc) Generally the CVS trunk doesn't contain all the changes that have been made in all the branches. (After all, how do you deal with the fact that version histories will conflict - editting is no longer a linear sequence. One approach is to go to the branch - and having a point to the appropriate branch - if it's not behind a firewall is a possibility - assuming you know where that branch lives )

I've performed this setup now on 4 wiki servers (one on my laptop, two at work, one public), with the laptop one taking feeds of different webs for local editting. This means I can work on several independent wikis all locally on my machine and periodically down/upsync automagically from/to the correct server. (In a similar way that you can run a news server and take feeds from several news machines)

-- MS - 23 Jan 2004

The "it's all good stuff" approach with the document text I like, and can see working. It's the changes to meta-data, specifically form content, that worries me. Reiterating your alternatives

    1. Assume the master server is always correct.
    2. Tag the page as a conflict needing resolution in metadata
    3. Place the conflicting metadata into storage (the topic text is simplest) and checkin.
    4. The ideal scenario here is to allow the existance of alternates - which then allows the topic to have multiple sets of metadata.
IMHO 2 and 4 are too complex. As you correctly point out, you are not trying to build a highly sophisticated CM system here. Just as well, given the difficulty of getting even a trivial change into the code. 1 is too simplistic. 3 is interesting. Metadata need to be treated as atomic units of change, in which case a simple strategy of "last come, best served" would work well. Consider the scenario that two guerrillas make changes to the same form field; who takes precedence? He who synchronises last? She with the latest date stamp?

Your degenerate case is amusing. It makes you think of other possibilities, like only synchronising a subset of your content.

-- CrawfordCurrie - 23 Jan 2004

Completely alternative approach for implementing this would be for all the wiki servers to listen to an NNTP feed looking for edit messages. When the wiki server has an edit, when the lock is released (whatever method) a diff is performed, and a unified diff with sufficient context is posted into a newsgroup. When the message is picked up by a remote site from the NNTP feed it is merged into local edits of the site. The messages could be stamped with the username (+local edit page) of the editor.

Problems in this scenario relate around lost NNTP messages - which can be a problem for some sites. This would cause the topic text on different sites to be permenantly out of sync. The nice thing about this solution however is it becomes completely decentralised.

-- MS - 24 Jan 2004

http://mailsync.sourceforge.net/ does a three way merge of IMAP repositories...

-- MartinCleaver - 14 Jan 2005

all this discussion is quite interesting, but I don't yet see it concretize into an implementation, be it a plugin or whatever else or even just a detailed functional/technical specification. with a previous group I have been working with, we had planned also something like this, but then we had no resources to use to implement our ideas. We did not want to use anything else than RCS whereas the installation, especially on the satellites, had to be as easy as possible, conflicts had to be solved on the satellite and had to be presented in twiki format in the edited topics (a non technical user uses TWiki, (s)he does not want to learn an other symbolic language for the conflicts. these are seen as rubbish created by TWiki, so it would be quite surprising for the user to see that twiki uses non-twiki syntax.)

anyhow: the interaction is described in some detail on the page TWiki:Codev.TWikiWithIntermittentConnectivity?rev=1.1, but it is written in Italian. I'm slowly translating it into English. at the moment only the first lines are readable (the rest has been automatically translated) and the summary too can be used.

Since the basic ideas are probably clear enough, I wonder if there are any reactions (let alone the cry: "implement it and translate the docs!").

-- MarioFrasca - 10 Mar 2005

I was thinking about this style of interaction: I want to take advance of the presence of a web server also on the satellites, server containing pages and scripts that can be called by the planet after it has received a set of patches.

one point of attention is that I'm not trying to keep version numbers on the satellites. all edits get shrunk in one version increment on the planet, and thus it is returned to the satellite. this because it gets the differences all at once, after a possibly local history of (non conflicting) modifications.

satellite action planet
presend asks the user what s/he wants to syncronize  
send checks the last connection time, that no topic has unresolved conflicts, then prepares the rcs.diff file and sends them to the central patchweb script  
local:cycle (until planet calls) receives the rcs.diff file, applies the patch, invokes satellite's getupdate patchweb
local:still cycling,
remote:getupdate
receives the complete differences between the last connection and the present, included the own data. this syncronizes topic versions. updates last connection time. sets flag so cycling can end.  
search?... shows the list of conflicting topics. these are kept in a recognizable format so that it cannot be later sent unresolved to the planet.  

-- MarioFrasca - 12 Mar 2005

There has been some discussion on Meatball about the concept of a DistributedWiki - an internet wiki where nodes are local computers that can drop in and out. Not quite the same concept, but you might find the discussion amusing, though rather academic.

These are all problems that have been addressed in the design of distributed CM systems, such as ClearCase and SVN. Can you leverage anything from there? For example, if each of the satellites had a SVN checkout area, and a merge with the central server was equivalent to an SVN checkin. Conflicts would have to be immediately flagged for resolution. A satellite patching in to the central server would effectively just do an svn update.

  • Um, SVN is not a distributed VC system. SVK is (somewhat). -- AndyGlew Thu Oct 5 2006

-- CrawfordCurrie - 12 Mar 2005

well, maybe it can be interesting, SVN does not look to me too different from CVS, I have no idea what this ClearCase is. I should definitely give it a look...

My aim is to keep as simple as possible, maybe you disagree given the results, but I was thinking of two differend add ons after a complete twiki installation: after you have everything set up, you install a satellite or planet plugin. the satellite plugin has to be configured as to know where is the planet (on first instance I would not make the planet too complex). then we already have RCS, we already have a web/twiki server on both ends, from here my proposed design. I think we have been abstract enough for enough time to make a concrete attempt. I recall having performed the steps described by hand, it did not behave that bad, but it has to be automatized. when we have a running prototype, we can make it better.

-- MarioFrasca - 12 Mar 2005

See ReplicationTechnologiesForTWiki

a) At least one key member of my team just plain out-and-out refuses to use TWiki because it is not accessible while offline and disconnected. He flies all the time, can write Word on the plane, can write email on the plane. He can even write to Windows shared directories on the plane, courtesy of replication and synchronization. But he can't write wiki easily while disconnected.

b) If looking to change the underlying version control from RCS to CVS or SVN, I recommend looking further, to one of the new generation of distributed VC systems. BitKeeper is the best example, but unfortunately has weird licensing. Open source distributed VC systems include GNU Arch, Monotone and Darcs. Linus has started writing his own, Git, now that Linux is no longer allowed to use BitKeeper.

I recommend against CVS and SVN, since they are both centralized VC systems. Maybe it would be neat to have TWiki use CVS or SVN for other reasons; kwiki does nearly all VC systems. But CVS and SVN will not solve the offline wiking problem.

SVN distributed stepchild SVK may be reasonable. However, if TWiki goes that way, I will stop using TWiiki.

Reason: the big reason why I use TWiki rather than Zope/Plone/Zwiki is that TWiki uses ordinary files. I can use standard UNIX tools like grep to manipulate them. And, yes, I frequently grep the ,v files. SVN uses a database, and SVK is built on top of SVN.

c) I started writing a tool to merge RCS ,v files. Basic idea is to compute a content based hash for each version, and then line up versions between ,v files. Most of the time the merge is easy; however when there have been conflicting edits you need to do a merge in much the way CVS does. And probably TWiiki would want a WUI (Web User Interface) to do such merge editing.

I.e. I merge the version history in the ,v files. The actual leaf content merge is separate.

It would probably be too much of a hassle to get Intel to allow my code to be shared, and my code isn't good enough to warrant the hassle. But the basic idea is straightforward, albeit slow.

d) given a completed RCS ,v file merge, and a leaf content merge WUI, then TWiki would only need to be extended to make the RCS ,v files available to download, when an offline wiki is being spawned.

-- AndyGlew - 14 Jul 2005

Erm, just one thing....

SVN does not use a database. it can be configured to use either berkelydb, which many people would argue is not quite a database, OR (more significantly) a fileSystem based store (though i think its binary). either way, the svnadmin dump command will give you a text based dump of the repository, which makes it less likely that some bugger edits your ,v files (which sadly happened with cvs)

that said, its a fair point that the ,v file system is unfortunalty a useful bonus (i also have done similar).

do arch, monotone, darcs or git (linus') actually use plain text files for the versioning info? as you are implying that they do....

in any case, it is extremely unlikely that we would drop support for rcs (though its quality would depend on those that use it to test it) when we start to add other backends, as my plan is to allow different backends to be used on the same twiki - that way freeform data would be in some text format (with distributed data possible), and some of the data would be in non-distributable database type stores (potentially where twiki is not the main client of that data), and other data sources that are a mixin of the two

All that said, any funtionality in TWiki is highly dependant on the code,docco,testing contributions of those users that need that functionality - and this is a classic case where everyone that says they have to have it, has not actually done it. (i don't need it, but i sure would love to see it)

-- SvenDowideit - 14 Jul 2005

I think the main obstacle is that the best distributed VC system is BitKeeper, which is not really open. The others - darc, arch, monotone, git - aren't really ready for prime time.

  • Thu Oct 5 2006: Git is now ready for prime time.

arch is widely used, but is highly idiosyncratic.

-- AndyGlew - 16 Jul 2005

ReplicationUsingUnison might work.

-- MartinCleaver - 26 Jul 2005

SyncContrib more or less implements this.

You'll only find it in SVN for now.

-- MartinCleaver - 23 Nov 2005

Edit | Attach | Watch | Print version | History: r51 < r50 < r49 < r48 < r47 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r51 - 2006-12-31 - ThomasWeigert
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.