kinosearch script uses a template kinosearch.pattern.tmpl (if you use the pattern skin). There is also a KinoSearch topic with a form ready to use with the kinosearch script.
If you have enabled the SearchEngineKinoSearchPlugin, you can use the rest handler either from a URL (this works only for a smaller TWiki), or the command line. The syntax is identical to the kinosearch script. https://twiki.org/cgi-bin/rest/SearchEngineKinoSearchPlugin/search
cd twiki/bin ; ./rest SearchEngineKinoSearchPlugin.search
kinosearch script. The installation instructions are detailed below.
%KINOSEARCH{...}% variable is handled by the SearchEngineKinoSearchPlugin.
%KINOSEARCH{ "search string" format="..." }%
"..." | Search string |
format="..." | Format of a search hit. Supported variables: • $icon - An icon to display file type when showing attachments • $match - The TWiki Name of the page being displayed • $locked - Show if a page is locked • $texthead - Summary text |
SEARCH $TWiki::cfg{RCS}{SearchAlgorithm} = 'TWiki::Store::SearchAlgorithms::Kino'; (a setting in the Store settings section in configure),
TWiki will use the KinoSearch index for any inbuilt search (including WebSearch) that it can (for regex searches it will fall back to the Forking search algorithm).
If you want TWiki's WebSearch to also show you attachment results (when you select the 'Both body and title' option), you need to also set {SearchEngineKinoSearchAddOn}{showAttachments}=1, and add kino to the front of your SKIN setting.
The reason this feature is experimental, is that kinosearch does not do partial matching, so searching for TAG will not match text like %TAG{"something"}%, only instances where the word TAG is seperated by whitespace. TWiki's SEARCH expects total partial matching.
text: before the word.)
+ and - operators, just as in Google query syntax, to indicate required and forbidden terms, respectively.
field: where <field> is the field name in the metadata (for instance, author).
text:kino or just kino
text:"search engine" or just "search engine"
author:MarkusHesse — note that to search for a TWiki author, use their login name
form:WebFormName to get all topics with that form attached.
CONTACTINFO:MarkusHesse if you have declared CONTACTINFO as a variable to be indexed
type:doc to get all attachments of given type
web:Main to get all the topics in a given web
topic:WebHome to get all the topics of a given name
+web:Sandbox +topic:Test to get all the topics containing "Test" in their titles and belonging to the Sandbox web.
kinoindex, kinoupdate and kinosearch scripts will be deprecated over time in favour of the restHandlers, both for security reasons, and to make compatibility with TWiki 5.0 easier.
cd twiki/kinosearch/bin ; ./kinoindex
https://twiki.org/cgi-bin/rest/SearchEngineKinoSearchPlugin/index
cd twiki/bin ; ./rest SearchEngineKinoSearchPlugin.index
kinoupdate script uses the web's .changes files to know about topic modifications.
Also, a .kinoupdate file is used on each web directory storing the last timestamp the script was run on it.
So when this script is executed, it first checks if there are any topic updates since last execution.
The most recent topic updates are removed from the index and then reindexed again. cd twiki/kinosearch/bin ; ./kinoupdate
https://twiki.org/cgi-bin/rest/SearchEngineKinoSearchPlugin/update
cd twiki/bin ; ./rest SearchEngineKinoSearchPlugin.update
# m h dom mon dow command 35 * * * * cd /path/to/you/twiki/bin ; ./rest SearchEngineKinoSearchPlugin.updateYou can also optionally use SearchEngineKinoSearchPlugin's updateHandlers to automatically update the index whenever a topic is modified (or an attachment uploaded) by setting
{SearchEngineKinoSearchPlugin}{EnableOnSaveUpdates} to true in the Extensions section of configure. Warning this can cause topic saves and attachments to become unacceptably slow, as the index update happens before the browser operation has completed.
KINOSEARCHINDEXEXTENSIONS. You can copy & paste the next lines in your Main.TWikiPreferences topic
* KinoSearch settings
* Set KINOSEARCHINDEXEXTENSIONS = .pdf, .html, .txt, .doc, .xls, .docx, .pptx, .xlsx,
or whatever extensions you want. If you add other file extensions, they are treated as ASCII files. If needed, you can add more specialised stringifiers for further document types ( see Indexing further document types).
form_name. How to search for this is described below.
Note: With kinoupdate only the form fields that existed at the
time the initial index was created are indexed. Thus if you add a
form or if you add a new field to an existing form, you should create a new index with kinoindex.
lib/TWiki/Contrib/KinoSearch/StringifierPlugins.
You can add new stringifier plugins by just adding new files here. The minimum things to be implemented are: TWiki::Contrib::SearchEngineKinoSearchAddOn::StringifyBase
__PACKAGE__->register_handler($application, $file_extension);
$text = stringForFile ($filename)
KINOSEARCHINDEXEXTENSIONS in TWikiPreferences. Now
the defined document type should be indexed and the new stringifier should be used.
NOTE: If you just extend the list without having a special stringifier
in place, this document type is treaded like an ASCII file. For binary
document types, this may lead to problems (inpropper search results,
long indexing times and potential indexing break downs).
{SearchEngineKinoSearchAddOn}{WordIndexer} setting.
Note2: If you do not install any of the mentioned backends, you
should remove .doc from the KINOSEARCHINDEXEXTENSIONS variable.
To install antiword for Debian you can: apt-get install antiword
apt-get install abiword
apt-get install wv
apt-get install xpdf-utils ppthtml
xpdf, you should remove .pdf from the KINOSEARCHINDEXEXTENSIONS variable.
ppthtml, you should remove .ppt from the KINOSEARCHINDEXEXTENSIONS variable.
docx2txt.pl (http://docx2txt.sourceforge.net/pptx2txt.pl (http://pptx2txt.sourceforge.net
) at appropriate paths. By default, in most of linux/unix system, the tool can go into ==/usr/bin directory.
perl -MCPAN -e "install KinoSearch" perl -MCPAN -e "install File::MMagic" perl -MCPAN -e "install Module::Pluggable" perl -MCPAN -e "install HTML::TreeBuilder" perl -MCPAN -e "install Spreadsheet::ParseExcel" perl -MCPAN -e "install CharsetDetector" perl -MCPAN -e "install Encode" perl -MCPAN -e "Spreadsheet::XLSX" perl -MCPAN -e "Text::Iconv"Note for Windows: For Windows, make sure you have a C-compiler in place. This is normally part of Visual Studio etc.
configure to configure the advanced features {SearchEngineKinoSearchAddOn}{showAttachments}
{SearchEngineKinoSearchPlugin}{EnableOnSaveUpdates}
{SearchEngineKinoSearchAddOn}{WordIndexer}
$TWiki::cfg{RCS}{SearchAlgorithm} = 'TWiki::Store::SearchAlgorithms::Kino';
configure interface (Go to Plugins->Find More Extensions) .zip or .tgz archives
perl <module>_installer )
configure and enable the module, if it is a plugin.
,v files in your existing install (take care not to lock the files when you check in)
(Note, these are not where the defaults are set)
* KinoSearch settings
* Set KINOSEARCHINDEXEXTENSIONS = .pdf, .doc, .xml, .html, .txt, .xls, .ppt, .pptx, .docx, .xlsx
* Set KINOSEARCHSEARCHATTACHMENTSONLY = 0
* Set KINOSEARCHSEARCHATTACHMENTSONLYLABEL = Display only attachments
* Set KINOSEARCHINDEXSKIPWEBS = Trash, Sandbox
* Set KINOSEARCHINDEXSKIPATTACHMENTS = Web.SomeTopic.AnAttachment.txt, Web.OtherTopic.OtherAttachment.pdf
* Set KINOSEARCHANALYSERLANGUAGE = en
* Set KINOSEARCHSUMMARYLENGTH = 300
* Set KINOSEARCHDEBUG = 0
* Set KINOSEARCHMAXLIMIT = 2000
* Set KINOSEARCH_ATTACHMENT_INDEX_SIZELIMIT = 2000
You can also configure (The Extensions:SearchEngineKinoSearchAddOn
$TWiki::cfg{KinoSearchLogDir} = '/home/httpd/twiki/kinosearch/logs';
$TWiki::cfg{KinoSearchIndexDir} = '/home/httpd/twiki/kinosearch/index';
Remember to edit the file kinosearch/bin/LocalLib.cfg and modify twikiLibPath accordingly to your configuration
SearchEngineKinoSearchAddOn directory (e.g. /var/www/twiki/working/work_areas/SearchEngineKinoSearchAddOn directory if your TWIKI_ROOT is /var/www/twiki and $TWiki::cfg{WorkingDir} = /var/www/twiki/working )
These attachments are skipped during next time indexing.
antiword, abiword or wvHtml is in place: Type antiword, abiword or wvHtml on the prompt and check that the command exists.
pdftotext is in place: Type pdftotext on the prompt and check that the command exists.
ppthtml is in place: Type ppthtml on the prompt and check that the command exists.
kinosearch/bin twiki installation directory.
./kinoindex
TWiki/KinoSearch topic.
ks_test kinoindex
scipts fails, takes too long on attachments or kinosearch does not yield correct
results. Some times this may result from installation errors esp. of
the installation of the backends for the stringification.
ks_test give you the opportunity to test the stringification in
advance.
Usage: ks_test stringify file_name
(I plan to extend ks_test, but at the moment the only possible second
parameter is stringify).
In the result you see, which stringifier is used and the result of the
stringification.
Example:
/home/httpd/twiki/kinosearch/bin$ ./ks_test stringify /home/httpd/twiki_svn/SearchEngineKinoSearchAddOn/test/unit/SearchEngineKinoSearchAddOn/attachement_examples/Simple_example.doc Used stringifier: TWiki::Contrib::SearchEngineKinoSearchAddOn::StringifyPlugins::DOC_antiword Stringified text: Simple example Keyword: dummy Umlaute: Größer, Überschall, ÄnderungYou see that the stringifier DOC_antiword is used and the resulting text seems to be O.K.
| Add-on Author: | TWiki:Main/MarkusHesse |
| Add-on Version: | 2012-11-13 |
| Copyright: | © 2007-2009 TWiki:Main.DavidGuest © 2009 Twiki, Inc © 2009-2012 TWiki:TWiki.TWikiContributor |
| License: | GPL (GNU General Public License |
| Change History: | |
| 2012-11-13: | TWikibug:Item7020 |
| 9 Oct 2009: | version 1.19, added support to index documents of type .docx, .pptx, .xlsx |
| Bug:Item6177:Attachments with issues to stringify are added into work area, they are skipped from indexing next time | |
| 20 Aug 2008: | v 1.18, added Integrated SEARCH, SearchEngineKinoSearchPlugin, restHandlers, updated code and tests -- TWiki:Main.SvenDowideit |
| 6 Aug 2008: | v 1.17, Bugs:Item5717 |
| 4 Jun 2008: | v 1.16, Bugs:Item5646 |
| 12 May 2008: | v 1.15, Bugs:Item5579 |
| 23 Apr 2008: | v 1.14, Bugs:Item5273 |
| 27 Jan 2008: | v 1.13, Bugs:Item5271 |
| 19 Jan 2008: | v 1.12, Bugs:Item5270 |
| 19 Dec 2007: | v 1.11, Additions on stringifiers, modification of output format |
| 17 Nov 2007: | v 1.10, PPT stringifier added |
| 11 Nov 2007: | v 1.09, Some bugfixing |
| 3 Nov 2007: | v 1.08, Some bugfixing |
| 7 Oct 2007: | v 1.07, Some bugfixing |
| 6 Oct 2007: | v 1.06, Upgrade for 4.1, Release with BuildContrib |
| 29 Sep 2007: | v 1.05, Indexing of form fields |
| 16 Sep 2007: | v 1.04, Stringifier plugins for doc, xls and html |
| 13 Sep 2007: | v 1.03, Indexing of PDF and TXT attachments |
| 08 Sep 2007: | v 1.02, Index and update script enhanced |
| 24 Aug 2007: | v 1.01, Update script included, Result uses highlighter |
| 14 Aug 2007: | Initial version (v1.000) |
| CPAN Dependencies: | CPAN:KinoSearch |
| CPAN:File::MMagic |
|
| CPAN:Module::Pluggable |
|
| CPAN:HTML::TreeBuilder |
|
| CPAN:Spreadsheet::ParseExcel |
|
| CPAN:CharsetDetector |
|
| CPAN:Encode |
|
| Other Dependencies: | pdftotext (part of xpdf-utils) |
| antiword, abiword or wvWare | |
| ppthtml | |
| Perl Version: | Tested with 5.8.0 |
| License: | GPL |
| Add-on Home: | http://TWiki.org/cgi-bin/view/Plugins/SearchEngineKinoSearchAddOn |
| Feedback: | http://TWiki.org/cgi-bin/view/Plugins/SearchEngineKinoSearchAddOnDev |
| Appraisal: | http://TWiki.org/cgi-bin/view/Plugins/SearchEngineKinoSearchAddOnAppraisal |
| I | Attachment | History | Action | Size | Date | Who | Comment |
|---|---|---|---|---|---|---|---|
| |
KinoSEARCH.jpg | r1 | manage | 68.6 K | 2008-08-19 - 06:00 | UnknownUser | Integrated SEARCH results with attachments |
| |
KinoSearchResult.jpg | r2 r1 | manage | 100.0 K | 2008-08-19 - 06:01 | UnknownUser | Search results with attachments |
| |
SearchEngineKinoSearchAddOn.md5 | r15 r14 r13 r12 r11 | manage | 0.2 K | 2012-11-14 - 05:42 | PeterThoeny | |
| |
SearchEngineKinoSearchAddOn.tgz | r15 r14 r13 r12 r11 | manage | 192.4 K | 2012-11-14 - 05:41 | PeterThoeny | |
| |
SearchEngineKinoSearchAddOn.zip | r21 r20 r19 r18 r17 | manage | 215.2 K | 2012-11-14 - 05:41 | PeterThoeny | |
| |
SearchEngineKinoSearchAddOn_installer | r11 r10 r9 r8 r7 | manage | 5.7 K | 2012-11-14 - 05:42 | PeterThoeny |