attachments1Add my vote for this tag search1Add my vote for this tag create new tag
, view all tags

I've been doing some preliminary research on improving the functionality of the search in TWiki. As I see it there are two routes we can follow:

  1. Use Microsoft Index Server: http://www.microsoft.com/NTServer/fileprint/exec/feature/indexfaq.asp


Microsoft® Index Server is a full-text indexing and search engine for Internet Information Server and Windows NT® Server. It allows any Web browser to search documents for key words, phrases, or properties such as author's name.

Index Server 2.0 is available as part of Internet Information Server 4.0. They are both available as part of the Windows NT 4.0 Option Pack and can be downloaded for free.

Index Server is capable of indexing textual information in any document type through content filters. Filters are provided for HTML, text, and Microsoft Office documents, and Adobe has released one for pdf documents (see http://www.adobe.com/support/techdocs/12b42.htm). Application developers can provide support for any other document by writing to the open IFilter interface. An IFilter knows how to read a file and extract the text. This text can then be indexed.


  • We are already using NT Server 4.0
  • Microsoft product probably works best with Microsoft Office documents


  • It would limit us to a Microsoft Server and a Windows Box. In the future we may want TWiki to run on Unix. { I'm using TWiki on Linux, and I suspect others are too. -- HendrikBoom - 31 Mar 2001 }
  • We would probably be violating the GNU liscense agreement (http://www.gnu.org/copyleft/gpl.html) by integrating TWiki with commercial software.

  1. Find some free software to do the job for us


  • Adheres to Gnu GPL
  • Perl Inteface
  • Platform indepentdent


I searched SourceForge for appropriate projects (this link will get you started http://sourceforge.net/softwaremap/trove_list.php?form_cat=93&discrim=15,176,235).


  1. Swish-E: http://sunsite.berkeley.edu/SWISH-E/
  2. Swish++: http://homepage.mac.com/pauljlucas/software/swish/ and
  3. Yahoo eGroup relating to 2: http://www.egroups.com/group/swish/

Swish++ seems to be the most advanced version of the product but is written in C++. Swish-E is still Perl I believe and may be easier to integrate.

Attachment indexing in Swish++ is done by a seperate package called extractor which is based loosely on the unix Sting command. This needs further investigation.

-- TristanClarke - 06 Mar 2001

See also: http://twiki.org/cgi-bin/view/Support/SearchAttachments

-- MartinCleaver - 29 Mar 2001

Search Tools Product Listings see http://www.searchtools.com/tools/tools.html

-- TWikiGuest - 29 Mar 2001

Wow, thanks. Says on http://www.nwc.com/1120/1120f1side3.html :

Open-source search-engine efforts are alive and well. They may not be quite up to the highest capacity, but they are almost infinitely configurable. Most are light on user interface for the search administrator and require command-line and config-file control, but they are powerful and flexible.

Ht://Dig (www.htdig.org) was developed at San Diego State University and released under the GPL (GNU General Public License). It's a solid search engine for Unix machines. Ht://Dig's robot crawls links on Web pages and the indexer interfaces with open-source code to read PDF and Microsoft Word files. The response is fast and the relevance ranking reasonable (it will improve in version 3.2, under development as of this writing). There are several options for "fuzzy" text searching, including soundalikes, common word endings and synonyms. The system has required configuration files for administration, but an open-source ConfigDig interface now provides access via Web browsers to many of the features. The core development team is active and responsible, and there's a friendly community mailing list.

UdmSearch (search.mnogo.ru) was also developed under the GPL and can index Web pages, FTP sites, Usenet newsgroups and local files. For index storage, it can use almost any SQL server. Because it was developed in the Russian Federation of Udmurtia, it's very good at supporting multiple character sets and languages. In addition to simple HTML forms, UdmSearch provides PHP3, PERL and C CGI access to the search engine, offering significant flexibility and options in arranging search results. There's an active online community, and the developers answer questions quickly.

-- MartinCleaver - 29 Mar 2001

UdmSearch is now known as mngosearch and the Windows version is $$. However, Namazu looks promising. Linux, Unix and Windows versions and it's GPL. Some of it's bits are even written in Perl.

-- DavidLeBlanc - 31 Mar 2001

Well we've tried long and hard to get HtDig to work on Windows but gave up due to difficulties with cygwin. In short:

HtDig was incompatible with the latest version of Cygwin. Unfortunatley, because of the Cygwin distribution method it proved difficult to obtain previous versions of the software. These problems should not occur on the UNIX platform as the reliance on Cygwin is removed.

We did, however, get MicrosoftIndexServer to successfully work. The developer said:

MS Index Server is possibly a better choice as it is better featured 'out of the box' (for example, support for incremental indexing, and indexing of office documents is built-in).

I'll make enquiries to make the case to release the code...

-- MartinCleaver - 23 Oct 2001

This will shortly appear as IndexServerSearchForMsIisAddOn

-- MartinCleaver - 20 Jan 2002

I found interesting free perl-based search engine at http://www.perlfect.com/freescripts/search/ . You can customize which folders to index. It even can search PDF files!

-- PeterMasiar - 23 Jan 2002

Namazu is a very good open source search engine. Some of it is written in C but alot is also written in perl. Namazu indexes file systems as opposed to web sites. An example can be seen here: http://www.searchlores.org/cgi-bin/search?query=

It can be downloaded from http://www.namazu.org/

-- PeterMarelas - 31 Jan 2002

I like the example page -- the ability to search an "implied" field (created by, for example "Subject:") is very nice (and very AskSam-like).

-- RandyKramer - 31 Jan 2002

Also see SearchEnginePluceneAddOn and SearchEngineSwishEAddOn

-- JosMaccabiani - 16 Aug 2005

See also SearchEngineKinoSearchAddOn

-- MarkusHesse - 16 Sep 2007

Edit | Attach | Watch | Print version | History: r14 < r13 < r12 < r11 < r10 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r14 - 2007-09-16 - MarkusHesse
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2015 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.