Sometimes we get the question on how well TWiki can scale. This blog post compiles scalability related information so that you can plan your TWiki deployment effectively.
TWiki was designed as an enterprise wiki from its inception. You find features specifically designed to support large deployments. Other wiki engines have a different focus and may lack some of these features. Wikis typically flourish in grassroots. Once at the radar screen of the CTO/CIO, grassroots wikis often get consolidated into a central TWiki. That is when scalability comes into play. Key scaling features of TWiki:
Multiple webs (workspaces):
You can create as many webs as you need. Some large TWiki deployments have over 1000 webs. Think of a web like a wiki within TWiki. Each team can get their own wiki. People need to register only once, then they can create content in their own space. If needed you can link across webs, such as to reference a registered user or an entry in the Glossary web.
Fine grained access control:
You can create TWikiGroups and restrict access to content for view and edit based on those groups. Although it is possible to restrict access on a topic (page) level, it is typically done on a web level for ease of administration.
Authentication:
In a large deployment it is advisable to authenticate users against your directory server, such as Active Directory or LDAP. That reduces the workload on registration/login questions.
File attachments:
TWiki has a per-topic namespace for file attachments. That means, if one team uploads a file called inventory.xls to their team page, and another team uploads a file of the same name to a different page, they will not collide. Try that with Mediawiki or other wikis.
You can limit the maximum size of attachments that can be uploaded. This can be done for the whole site and also on a web level.
Web 2.0 is all about user generated content. TWiki is a web application platform where you can install ready made applications. For example, check out the BlogAddOn, the TWikiDotNetForumAppAddOn and other TWiki extensions in the Plugins web.
Create your own applications:
TWiki goes beyond Web 2.0: The TWiki platform is about user generated application logic. Your users can create situational applications that solve specific business needs, such as a bug tracker, a employee news portal, TWiki's Support web and more. You do not need to be a programmer; all application logic is done in TML (TWiki Markup Language) using TWiki forms, reports and optionally some HTML and JavaScript.
The IT department is in charge of the wiki dial tone and wants to have some control over the wiki deployment. With TWiki you allow users to experiment in a controlled environment. That is, IT can get the dreaded "shadow IT" under control.
Integrate:
TWiki has a plugin API and ready made plugins to connect to external databases. That way you can run a query in MySQL and other RDBMS and display the result in TWiki pages. Useful to show CRM data to sales teams and bug trends to engineering teams. See Extensions:database.
Server Selection, Caching, Load Balancing
Plan for adequate server hardware when you deploy TWiki. The following are ballpark figures for an sample TWiki deployment serving 1000 employees and 50,000 pages:
Enterprise class Linux
Dual core CPU 2.6 GHz
2 GB RAM
RAID 1 or RAID 5 for redundancy
Dual power supply for redundancy
Plan disk space:
Page content: 15MB per 1000 page (yes, MB, not GB)
File attachments: 1GB per 1000 pages
If you have a high read to write ratio (such as a TWiki on the public internet) consider a caching solution and/or a load balanced setup.
For high volume traffic sites it is possible to put TWiki on a load balanced setup. Here is an example:
Cisco Ace load balancer.
3 webservers.
NAS storage back-end.
Webservers share data on NAS for pages, file attachments and log files.
In the early days, TWiki.org was on a load balanced server setup while hosted at SourceForge. Now it is on a single and aging server hardware. The TWiki community plans to move TWiki.org again to a load balanced server setup which will improve the performance considerably.
Scalability of Search
TWiki uses the Unix grep command to search content in real time. This enables flexible and powerful searches in real time, which is important for TWiki applications. Search is covered in SearchHelp, VarSEARCH, QuerySearch, FormattedSearch and SearchSupplement.
The real time search has a performance impact. Searching all webs in a TWiki sites with more than 50,000 pages can be slow. If you have a large TWiki deployment of more than 50,000 pages it is advisable to index TWiki content with a search engine. This can be done with a commercial search engine such as the Google Search Appliance or an open source search engine. TWiki currently has three open source search engine integrations: SearchEnginePluceneAddOn, SearchEngineSwishEAddOn and SearchEngineKinoSearchAddOn. See more Extensions:search.
To scale the queries of TWikiForms based TWiki applications look into DBCacheContrib and DBCachePlugin.
Flat File Back-end
Some people express concerns that TWiki's flat file back-end does not scale well. We know of a number of large TWiki deployments that have over 300,000 pages (such as at Yahoo), over 1000 webs (such as at a major telco company), and over 10,000 users (such as at a major financial institution in USA).
A flat file based storage back-end has several advantages:
Simple installation
Simple backup and restore
Simple migration of content between TWiki installations (think of grassroots wiki consolidations, spin-offs and acquisitions of companies)
Well understood caching and replication technologies available
Resilient to data corruption
Some scaling factors:
TWiki scales well on number of webs, e.g. it does not matter much if you have 3 webs or 3000 webs.
TWiki has a limit on the number of pages in a web. You will see a performance impact if you have more than 20,000 pages in a single web. This depends on the file system/configuration used, on the bandwidth of your server I/O and on the memory installed.
TWiki scales well on the number of registered users. We have not done tests on the upper limit. It is also feasible to not register users in TWiki, e.g. to rely solely on LDAP login.
As stated above, performance can be addressed with caching and/or load balancing.
The TWiki community is working on a pluggable storage back-end, see TWikiRoadMap.
(This post is based on Peter Thoeny's blog post on Scalability of TWiki, also posted as a supplemental document at TWikiScalability.)
Comments
MichaelDaum - 26 Mar 2008:
See also the discussions on the upcoming TWikiCache, a build-in caching infrastructure and dependency tracking. Compared to all other caching solutions this one aims at (1) correctness: never deliver outdated wiki content and (2) transparency: no extra provision is needed by the wiki author to get content cached. The TWikiCache will be part of TWiki-5.0. There are backports available for 4.1.x and 4.2.x as well.
MartinSeibert - 26 Mar 2008:
Very valuable information, Peter. Thank you.
ColasNahaboo - 02 Apr 2008:
On already available performance enhancements, there is also:
using long expires on images, css & javascript files
deporting these files on servers specialized in serving static files
And there are other solutions in the works, such as Michael one, or Gilmar work as part of TWikiStandAlonePeterThoeny - 03 Apr 2008:
Colas, good additional info. Possibly update the supplemental document TWiki.TWikiScalability?