Tags: view all tags

Highlighted Success Stories:
• Aseaco AG • British Telecommunications • CERN NEW • Cingular Wireless • Cmed • DHL Packstation • Lost Boys • magazine publishing • Michelin China • Morgan Stanley NEW • SAP • Texas Instruments • Thomson Learning • Wind River • Yahoo!

TWiki Success Story of Morgan Stanley:
A Globally Replicated Intranet TWiki With 30,000 Users, 500,000 Topics, Across 3 Regions

By Hideyo Imazu, Vice President, Enterprise Infrastructure, Morgan Stanley, 2011-09-28

Preface

This document is a technical description of a TWiki installation in Morgan Stanley (called "the firm" here after) internal network. In doing so, this document focuses on how to host literally thousands of webs on servers distributed around the globe.

To achieve our goals, we enhanced the TWiki core in addition to implementing custom plug-ins and enhancing existing plug-ins. Those enhancements are not provided to the TWiki community yet, but are going to be.

At a glance

Usage figures

Our TWiki installation has:

3000 top level webs and 5000+ webs in total including sub webs
500,000+ topics
4+ million page views per month by human users, 10+ million page views by intranet search crawlers and other programs
30,000+ unique visitors out of 60,000 employees world-wide each month
- We have a firm-wide single sign-on environment, which is integrated into our TWiki installation. As such, we don't have TWiki user registration. Everybody in the firm can use TWiki in the same manner as a registered user on a vanilla TWiki installation having registration
5,000 unique contributors editing topics or uploading attachments each month
350G bytes of content

TWiki as-is is not intended to become this big. There are enhancements/modifications made to TWiki to cope with this level scale, which will be discussed later in this document.

Replicated globally

We enhanced TWiki so that read-only copies of a TWiki web can reside on different TWiki sites. Together with an enhancement to copy TWiki webs automatically and manually, we operate three TWiki installations in New York, London, and Tokyo as a single coherent installation.

The three TWiki installations are loosely coupled. They work fine in isolation but they have the same content by daily content synchronization.

The enhancements for TWiki web replication will be discussed later.

Hardware resources used

Each region has:

2 to 4 commodity x86 servers (2 to 4 cores per server) in two different data centers. All TWiki servers in a region are behind a load balancer
500G bytes of highly available network attached storage (NAS) for content and work areas

A web load balancer and a network attached storage withstanding a single data center loss are not commodity, but they are available off the shelf and don't need engineering to use.

People working on enhancement, maintenance, and operation

Unlike SharePoint and other enterprise tools, TWiki has a very low cost of maintenance. There is no person dedicated for TWiki in the firm. There are a handful people working on enhancement, maintenance, and operation, each of whom is spending fraction of their time for TWiki. All combined, TWiki is taking half to three quarters of a person.

History of TWiki in the firm

Before diving into deeper, let us describe the history of TWiki in the firm briefly.

We started using TWiki in 2004, at which point it was a vanilla installation. There was one TWiki site for the entire firm.

Then in 2007, the current incarnation was engineered based on TWiki 4.1.2. The features described on this document were implemented then and have been evolving. We plan to renovate it to be based on TWiki 5.1 or later.

Use cases

Use cases affect users expectation and requirements. Though this document is technical in nature, I need to spend some space for use cases.

Typically in the firm, TWiki webs are used for the following purposes.

Operation manuals of hardware and software
Internal software product documents
Information sharing within a team

This in part stems from the user demographic - 90% IT and 10% the others.

Each use case is elaborated hereafter.

Operation manuals

Being a financial company, the firm has quite a few software systems to run the business - trading, trade confirmation, risk analysis, financial settlement, customer relationship management, to name a flew. And there is a sizable hardware resources to support them. Many of them have operation manuals on TWiki.

TWiki not being a trading system or settlement system, TWiki's failure doesn't make the firm lose money directly. But some of mission critical systems depend on TWiki for their operation. In day-to-day operation and switching to new releases, operation manuals' availability is crucial. A very good track-record of availability and global content replication make TWiki appealing.

There was a time when TWiki needed to be shutdown completely world-wide. That was pushed back several times because of important weekend operations of some systems which depend on TWiki for those operations.

Software product documents

Having a sizable amount of internally developed software systems, we have a sizable collection of supporting libraries shared among multiple systems and teams. We also have various software tools for development, debugging, testing, versioning/revision control systems, etc. used widely in the firm.

There are teams developing/supporting those libraries and tools. They tend to provide documentation of their products on TWiki.

Team webs

Many development, engineering, and operation teams have their team webs. In some cases, they have a web for team internal use and another web for their clients/users.

In those webs, typically, meeting minutes and progress reports are shared among the team members.

Compared to other content management platforms

High availability

Our TWiki has a very good track record of availability. It kept working as usual when one data center was lost. When another data center outage happened, TWiki stopped working because the network attached storage was lost despite the high availability design. But TWiki recovered faster than the other content management platforms.

There are internal users who don't want to put their content on other content management platforms than TWiki because other platforms don't provide high enough availability.

We don't do a lot to make our TWiki highly available. But the things we do for availability is discussed below.

Cost effective

As mentioned above, our TWiki is using small amount of hardware and human resources compared to other content management platforms used in the firm.

Caveat

It's not fair to compare TWiki with other content management systems only in terms of availability and cost. Because other content management systems provide various features not provided on TWiki.

There is no panacea - feature rich but expensive and not so fault tolerant, or simple but cheap and robust.

For high availability

From this part on, specific configuration choices and enhancements made to our TWiki are discussed. First, for high availability.

Our TWiki site is built in the following manner to increase availability.

A TWiki site consists of two or more servers in different data centers and behind a web load balancer
Content is hosted on a highly available network attached storage, which is shared by all TWiki servers constituting a TWiki site. The storage provides nightly snapshots. Even if a topic or web is deleted, you can recover it from a snapshot.

Having multiple servers should be contributing to performance especially in NY where the intranet search engine crawler hits TWiki a lot.

To host thousands of webs

Enhancements made for scaling are boiled down to the following purposes.

To make webs self-service as much as possible and reduce the need for TWikiAdminGroup member intervention
To prevent clutter from accumulating

Individual enhancements are described in the context of those purposes hereafter.

Clear web ownership

Inevitably, some webs are abandoned for various reasons. If each web has a definite owner, it's much easier to delete abandoned webs.

We introduced web meta data for this and other purposes. The meta data is stored in an SDBM file and the TWiki core is enhanced to refer to it. A custom plug-in (%QINFO{...}% and an add-on (cgi-bin/qinfo) have been made to refer to and update the meta data. Each web has one and only internal mailing list owning the web among other things.

When the mailing list associated with a web becomes unreachable, the web is regarded orphan and will be deleted after a grace period, during which each topic of the web has "This web is orphan. Please claim ownership to avoid deletion".

All members of the owner mailing list of a web have full access to the web regardless of access restriction setting. Without this feature, an owner may kick out themselves or somebody else when access restriction is set wrongly, which requires manual intervention by TWikiAdminGroup members.

Locking down the Main web and introducing personal webs

The Main web needs to be writable to host user topics. That means a user can make how many pages they like there, which leads to a lot of abandoned pages without clear ownership. To prevent it, we modified the access control logic so that a user can create only their personal topic and web left bar topic on the Main web.

There are user who want to have more than one pages of their personal use. For that need, we introduced user personal webs, which are sub webs of the User web and named User/WikiName. The owner of a user personal web has full access to the web regardless of access restriction setting, we enhanced the access restriction logic as such.

Rotating Sandbox and Trash webs

The Sandbox web needs to be writable by anybody to meet its purpose. So it can accumulate a lot of clutter over time. To prevent it, in our custom daily clean-up script, a Sandbox rotation is conducted every weekend: An empty Sandbox web is created after Sandbox is renamed to Sandbox1 after Sandbox1 is deleted.

The Trash web is similar. It accumulate deleted topics and attachments. In the same daily clean-up script, we do a Trash rotation daily: An empty Trash web is created after Trash is renamed to Trash1 after Trash1 is renamed to Trash2 ... after Trash10 is deleted.

TWiki groups can be defined anywhere

On vanilla TWiki, TWiki groups need be defined on the Main web. Since we lock down Main, TWiki groups need to be able to be defined on any Web. So we enhanced TWiki as such.

This makes things cleaner because a TWiki group topic is linked naturally - Let's say you have ATWikiGroup defined on the AWeb. On the AWeb.TopicOne topic, ATWikiGroup becomes a link to the topic. And that's where the TWiki group is defined.

Disable the "for all webs" options

It's not practical at all to do things for all of the 5000+ webs. The "for all webs" options on the "More actions" pages are removed.

Along the same lines, %WEBLIST{...}% returning the entire list of webs is eliminated. Instead, %QINFO{...}% referring to web meta data is used.

For content replication

Web meta data revisited

The web meta data introduced primarily for web ownership has the master site location of each TWiki web in addition. Each of the three TWiki sites in Morgan Stanley has its location code - na for New York (North America), eu for London (Europe), and as for Tokyo (Asia).

The web meta data is updated in New York and replicated to the other regions so that all the sites work coherently.

Read-only mirror webs

At least in TWiki version 4.1.2, TWiki.pm has read-only web support code, which is not fully functional. We enhanced it so that we can have read-only webs. In doing so, we made it refer to web meta data. If a web's master location is not the location of the site, the site regard it as read-only mirror web.

The view template is enhanced so that the Edit link on a mirror web points to the Edit URL of the master site. This way, we hide the replication to some extent.

Web mirroring mechanism

We've introduced the MirrorAddOn having the mirror script to replicate web content. It's used both from command line and from browser.

From command line, typically, mirror copies all webs whose master is not local. It uses the rsync command for copying.

From browser, mirror copies only one specified web from the master site.

Statistics

To make a global statistics reflecting accesses to all TWiki servers in all regions, access log files are mirrored. Each of the three TWiki sites has the other sites' access log files. This is done in the daily clean-up script.

Then, the statistics script is executed. The current log files of all sites are read.

Conclusion

TWiki out of the box is not tailored to host thousands of webs in one TWiki installation. TWiki has code for content replication but we had to tailor and enhance it to our needs. It's not designed specifically for high availability either.

Thanks to the way TWiki is written and its relative simplicity, it's relatively easy to make it highly available and make it host thousands of webs and have them replicated globally.

-- HideyoImazu - 2011-09-28

Comments

Hideyo-san, thank you very much for this comprehensive success story!

All: See also JitendraKavathekar's blog about this subject.

-- PeterThoeny - 2011-10-03

Very good practice and experience, thank you for your sharing.

-- FeiYaoJun - 2011-10-20

very good experience

-- caler lee - 2013-04-01

is there some new case about tWiki?