Tags:
create new tag
view all tags

ScalableWiki - Overview

I'm new to TWiki, but have a lot of Perl programming experience, particularly with complex back-end kinds of systems. My immediate observation about TWiki is the RCS implementation is utterly unscalable, and I'm really surprised that there's little reliance on a backend SQL server.

TWiki is of particular interest to me because I run a fairly large VBulletin based message board, and TWiki as an add-on makes a lot of sense. VB doesn't have a trivial means of making "home pages" for the various forums, and TWiki makes it a trivial thing to add.

To be more specific, the site I run is a basketball message boards; there's a forum for each basketball team. Each forum has been skinned for the team: Bulls have red/black with a nifty Bulls' banner at the top, Celtics have green color scheme with a nifty Celtics' banner at the top, and so on.

If I add Twiki to the site, there'd be a TWiki Bulls home page and a TWiki Celtics home page. The home pages would have the team's roster, record (Bulls are 22-28 as I write this), next game, stats, and so on. And a link to the forum. Ideally, it'd have links to the last 'n' posts in the forum, which would require some fancy SQL queries.

I'm prepared to hack all over TWiki to make this possible, including querying the VB3 user database for authentication.

This leads to the issue of scalability. An ideal VB3 scalable solution includes a dedicated MySQL server (or cluster) with "satellite" WWW servers running the PHP ( sorry for mentioning a dirty word! smile ) scripts. This allows the MySQL server to be highly optimized just for running the one dedicated application (hardware is rather cheap), and you can add WWW servers as the load demands it. Having at least 2 WWW servers makes the site more reliable, as one can be rebooted while the other still serves WWW pages.

The issue with TWiki is that it relies on RCS, which is a filesystem based revision control system. If you wanted to distribute a Wiki across multiple WWW servers, you have the issue of replicating the RCS and .txt and templates across all the servers. It's doable, but messy, and you certainly take a performance hit if you want all the filesystems to be in sync and concurrent. For example, if you repllicate through some cron task running 1/minute, you might have one WWW server with a newer file on it for up to a minute, while the other WWW server is dishing out content that is not up-to-date.

One possibility might be to move to a server based revision control system, like CVS or SVN. I do not think that is the ultimate answer because you still have to know "when" a document is modified to know "when" to do an update. I can see a moderately sized Wiki spending a lot of time thrashing on the disk doing these updates frequently enough to keep the content fresh.

So it ultimately boils down to using a backend SQL server, like MySQL.

Disk space is cheap! For a 3M post VB3 message board, the entire MySQL database is on the order of 3GB or less. Going to a really trivial revision control scheme that simply keeps a full copy of every version of a document isn't going to be that expensive. And if you really want to, you can bzip compress the old versions to save space, since they're not accessed all that frequently.

Once you have an SQL back-end, you have the ability to authenticate users with a table, store the templates in the database, and provide all kinds of nifty sorting ability on the kinds of data displayed. Which leads to some really needed features I'd like to see in TWiki. I'm also looking at TWiki as a replacement for sharepoint services in my software development company...

So a key feature I'm looking for, and will ultimately implement if we go with TWiki, are "lists." A list is a web-interface definable SQL table that can be both populated and queried from a WWW browser. Two examples of lists we rely on heavily are "tasks" and "bugs."

Task lists might be implemented on a per-project basis or on a global basis. For example, if you have a project page, there might be a dedicated list of tasks for that project; OR you might have a global list of tasks and have this page just sort/filter the tasks and show those just for the one project. We lean toward the latter, because then a user can have a "view" of the global task list that includes just his tasks - which can be for multiple projects.

The same concept works for "bugs." Each software project has its own list of bugs; or you have a global bugs list (for all projects) and you provide "views" of that list per project. Then you can query the global bugs list to analyze the commonality of certain kinds of bugs across projects, issues that are related to specific target platforms, and so on.

I'm really just scratching the surface of the possibilities once you go SQL backend.

SQL makes your metadata trivial to implement. SQL allows the implementation of TwikiApplications that are rich and complex - like a sortable basketball stats app. SQL allows you to trivially implement a caching scheme - documents are stored "raw" and "formatted" so you don't have to spend a lot of CPU converting TwikiFormatting codes into HTML every time you display a page. SQL provides a really rich mechanism for searching and sorting, obviously. SQL makes session tracking trivial.

(I'd like to be notified if just this page is edited or changed; SQL makes that REAL easy, too)

Another major benefit of going SQL backend is you don't have the server overhead of forking/exec'ing programs (like RCS).

At the same time something like this is implemented, you'd want to design the software to take advantage of mod_perl. For example, it's expensive to build up and tear down the TCP connection to the MySQL server each WWW page hit; keeping the socket connection open/persistent via global variables is an important strategy. Another strategy would be to have the templates cached in global variables so you can do a quick "content modified" kind of query and do nothing more if the templates haven't changed (they rarely do).

TWiki is already VERY cool, but I think that it can be even better.

Regards to all

-- Contributors: MykeSchwartz

Discussion

-- MykeSchwartz - 13 Feb 2006

BTW, another issue I found with using RCS...

Unix has serious performance issues when a subdir has about 4096 (or more) files in it. This means that a Web with 2048 topics (and no attachments) becomes a hog on the system running TWiki.

A scheme for fixing this is to use the first two characters of a WikiWord to make subdirectories:

WikiWord.txt would end up going into:

data/webname/W/i/WikiWord.txt

data/webname/W/i/WikiWord.txt,v

and attachments:

data/webname/W/i/WikiWord/

-- MykeSchwartz - 13 Feb 2006

Topic revision: r1 - 2006-02-13 - MykeSchwartz
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.