sub beforeAttachmentSaveHandler {
my ( $attrHash, $topic, $web ) = @_;
sub afterAttachmentSaveHandler {
my ( $attrHash, $topic, $web, $error ) = @_;
I've never used them myself, but I think MartinCleaver has.
-- CrawfordCurrie - 20 Nov 2004
Yes, Crawford is right: afterAttachmentSaveHandler would be ideal for your needs. It is defined to run "sometime after" the attachment is uploaded. (The current implementation is before returning to view but I envisioned a regular job fulfilling a queue of pending changes).
I tried to update the .changes and .plucupdate, we can know which files have been updated and the last time the plucupdate script run. Then, we can schedule a crontab job for plucupdate each hour (or what you like) and only the not yet indexed most recent changes will be processed by the crontab job.
I have to finishing testing but I hope to update the topic add-on tomorrow, and to upload a new ZIP file with the new version (I've also added comments to the scripts code).
-- JoanMVigo - 22 Nov 2004
Ok. The incremental version is ready, however I want to improve it beacuse under very heavy usage the plucupdate script may raise the "too many files open" error (I've read at Plucene mailing listplucindex initialized the .plucupdate file for each web.
-- JoanMVigo - 26 Nov 2004
I really want to give this a go, but my plate is already full with the DEVELOP branch. However, rest assured I will try it out and give feedback just as soon as I can!
-- CrawfordCurrie - 26 Nov 2004
Like Crawford, I agree that what you are doing is important, but lack the time to help out. Are there specifics you need assistance with? How do you interface to TWiki's inbuilt search mechanism?
-- MartinCleaver - 29 Nov 2004
The plucsearch script reuses a few lines of code from the Search.pm (retrieving of webs, list of topics for each web, checking access to result topics). However I didn't code the inline search or any options to limit search to a specific web, ordering the results by topic name, author or date.
To Do (unresolved questions) plucsearch script: limit search to a web scope, order results by topic name, author or date
116 # only pdf, html and txt - for more file types look for Plucene::SearchEngine at search.cpan.org
117 if ( $name =~ m/\.pdf$/ || $name =~ m/\.html$/ || $name =~ m/\.txt$/ || $name =~ m/\.doc$/ || $name =~ m/\.xls$/ || $name =~ m/\.ppt$/ ) {
118 $author = $attachment->{'user'};
Similar changes are required in plucupdate script at line no. 221
-- SopanShewale - 08 Dec 2004
Thanks for your efforts Sopan. I'll try to integrate your parsers into the main branch -
However, I'd like to code something that could be aware of new implementations, so that the addon does not require to be modified each time new parsers/indexers are available. Does Perl have some mechanism to check or list the member classes of one class? (something like the reflection API in Java, so that CPAN:Plucene::SearchEngine::Index
* Plucene settings
* Set PLUCENEINDEXPATH = /srv/www/personal/index or where your index folder is located
* Set PLUCENEINDEXEXTENSIONS = .pdf,.html,.txt,.doc
The PLUCENEINDEXPATH variable shoud be included in FINALPREFERENCES.
Now, if you use the contribution (ExtraBackendParsers.zip) by TWiki:Main.SopanShewale/usr/lib/perl5/site_perl/5.8.0/Plucene/SearchEngine/Index/ for me.
-- JoanMVigo - 15 Dec 2004
Because of missing feature of wildcard searching in Plucene, the partial topic name searching is little difficult. To add the support to partial-topic name search, changes in the plucindex, plucupdate and plucsearch scripts are required. I have created the patches for the scripts which will help us to search topics as described below.
The topicname : MyNewelyCCreated-Topic
Any of the query "topic:My", "topic:Newely", "topic:Created", "topic:Topic" can give the result of this topicname.
patches with three files plucindex.patch, plucsearch.patch and plucupdate.patch in it. Use the patch command to patch each individual scripts.
-- SopanShewale - 23 Mar 2005
Hello, i installed the Plugin, but now i get
Software error: Can't use an undefined value as an ARRAY reference at /usr/local/share/perl/5.6.1/Plucene/Search/BooleanQuery.pm line 122.
undefined value as an ARRAY reference, maybe some required package is missing or a version conflict occurs between required and installed packages. (please, detail a little bit more your software environment, thanks!)
grep search engine are missing, it also should be improved a lot.
grep search engine is. Once TWiki is installed, then you should choose the search engine to be used switching some configuration parameter at TWiki.cfg or TWikiPreferences. What do you think about it?
topic:Word (workaround for missing Plucene wildcard search)
.zip into CVS
-- WillNorris - 19 Jul 2005
Some answers: * is just another char) and regexps. To make it fully compatible with those functions, a lot of code should be added to the Plucene package, not to the add on.
Plucene::Store::InputStream cannot open /tmp/QoBg7OKMLk/_627.f24 for reading: Too many open files at /usr/local/lib/perl5/site_perl/5.8.6/Plucene/Store/InputStream.pm line 35.
(in cleanup) Plucene::Store::InputStream cannot open /tmp/QoBg7OKMLk/_627.f24 for reading: Too many open files at /usr/local/lib/perl5/site_perl/5.8.6/Plucene/Store/InputStream.pm line 35.
I spent half the day looking for a fix. Setting ulimit -n 2000 did nothing, adding use BSD::Resources; setrlimit(setrlimit(RLIMIT_NOFILE, 2000, RLIM_INFINITY); did nothing. Eventually, I found this rather obscure reference: http://plucene.minty.org/cgi-bin/wiki.pl?Totally_Un-Official_Plucene_FAQ#0020Plucene::Index::Writer::mergefactor to default to 5 instead of 10, and finally it works! I think this should be settable from a preference.
-- WadeTurland - 25 Aug 2005
Running plucupdate produces the following error:
"my" variable $writer masks earlier declaration in same scope at ./plucupdate line 272.-- JosMaccabiani - 26 Aug 2005 Hi JosMaccabiani, This is because $writer is already defined somewhere near line 181. Just remove "my" from line
my $writer = Plucene::Index::Writer->new($idxpath, $analyser, 0);-- SopanShewale - 29 Aug 2005 Hi, I have installed the SearchEnginePlucendeAddOn. It run fine, when i insert the
<form action="%SCRIPTURLPATH%/plucsearch%SCRIPTSUFFIX%/%INTURLENCODE{"%INCLUDINGWEB%"}%/">
<input type="text" name="search" size="32" />
</form>
in the side. But wenn i insert this text on the WebLeftBar
then i have no Anwers.
What must i change to work this correct?
-- KarlHeinzWichmann - 21 Sep 2005
Hello KarlHeinzWichmann,
I have faced similar problem while developing ApplicationAuthenticationAddOntwiki.pattern.tmpl includes "WebLeftBar" using html form- so if you are adding search form in WebLeftBar topic, it creates form within form and becomes a problem to browser.
Change the following block in twiki.pattern.tmpl from
%TMPL:DEF{"leftbar"}%<div class="twikiLeftBar"><div class="twikiWebIndicator"><b>%WEB%</b></div>
<div class="twikiLeftBarContents"><form name="main" action="%SCRIPTURLPATH%/view%SCRIPTSUFFIX%/%WEB%/%TOPIC%">
%INCLUDE{"WebLeftBar"}%</form></div></div>%TMPL:END%
to
%TMPL:DEF{"leftbar"}%<div class="twikiLeftBar"><div class="twikiWebIndicator"><b>%WEB%</b></div>
%INCLUDE{"WebLeftBar"}%</div>%TMPL:END%
This should solve your problem.
-- SopanShewale - 30 Sep 2005
Hi JosMaccabiani,
You had a following question:
> 1. My end users only know Google. What are the main differences between Google and Plucene (bar the scoring)? Has anybody already tried to explain this to his/her end users?plucupdate (html, txt and topic text is OK). Running plucindex fixes the issue. Any idea what is going on?
Also, I need this functionality for a document management system TWikiApplication: Is there a way to limit the search scope to only one web? Or, alternatively, all attachments in topics that have form XYZ? Preferably I'd like to hide that search scope from the user, e.g. in a hidden form field. Something like <input type="hidden" name="searchweb" value="%WEB%" />
I support the idea of Google like search. This is the standard people expect nowadays. Idea: The add-on could translate soap +wsdl "web service" -shampoo into the Plucene syntax.
-- PeterThoeny - 06 Jan 2006
Peter, Thanks for your comments. About issues of �plucupdate�, I am able to use that on my setup (cairo release); still I will go through the script to fix the issue.
Limiting search scope to a particular web � That�s also my requirement. We have to handle this hidden value appropriate to give the results. Already if you search �web:Myweb sometext�, this returns the results from Myweb web. I should be able to do this work.
Expectations like Google � Please see my comments of date 30/Sept/2005. By default behavior of �OR�, which can be converted into �AND� by making changes in Plucene::QueryParser module. Other +, - stuff works similar to google. Yes, Wildcard search is not yet supported by �Plucene�, that development should happen.
Indexing speed: My intranet site has around 5500 documents (topics and attachments), it takes around 2hr time to index. Indexing time should be reduced. I think some one should give a thought of using Lucene or some other port of Lucene for indexing purpose.
I am planning to do the thourough testing of this add-on for Dakar Release.
-- SopanShewale - 10 Jan 2006
when I try to execute twiki/bin/plucindex, I got �undefined subroutine &Twiki::basicInitialize called at ./plucindex line 42�.
could anybody give me some help?
-- TWikiGuest - 09 Feb 2006
We are working in a new version (SopanShewale and myself) using just functions provided by the TWiki::Func module. So it will be compatible at least with Dakar (and Cairo, I hope).
Some of the issues discussed in this topic has been addressed (limiting scope, search query like Google, skip defined webs from indexing), so we hope it will be quite useful. Stay tuned
-- JoanMVigo - 23 Feb 2006
Great! Just asking, what is the timeline for the new version?
-- PeterThoeny - 27 Feb 2006
Peter, We should be able to release the new version compatible with dakar by Friday, March 3. If the same does not work for Cairo, then will be provide separate fileset for cairo by March 10.
-- SopanShewale - 27 Feb 2006
Finally, new versions of this add on has been released, one for Cairo and other one for Dakar. Please, we would appreciate very much your feedback. Note also that due to lack of functionality exposed by TWiki::Func, the two versions of this add on still use internal core functions of TWiki.
For interested people on Plucene and/or its development, I just post here some links. SearchEnginePluceneAddOn-Cairo.zip and SearchEnginePluceneAddOn-Dakar.zip it is preferred to overwrite =SearchEnginePluceneAddOn.zip with the latest Cairo version, then overwrite it with the latest Dakar version; and in the add-on text, point to the latest cairo version (with a viewfile link). If you do not like this setup you could overwrite the SearchEnginePluceneAddOn.zip with the Dakar version, and keep a separete Cairo zip.
-- PeterThoeny - 02 Mar 2006
One of the first things I've been asked after my TWiki was up and running was "how can I search keywords MS-Office attachments?" Well, I thought, that's easy. Google for TWiki and MS-Office, and there you go.
But now things are starting to get hairy. Is there something like a PluceneForTWikiQuickStartGuide ? I find myself running perl -wTd to find out what the "required third party tools" might be (hint: all of xlhtml, ppthtml and wv are available as Debian packages)....
-- HaraldJoerg - 03 Mar 2006
Yes, you are right. I should update the topic with the following instructions ...
Just build Plucene with
perl -MCPAN -e "install Plucene" perl -MCPAN -e "install Plucene-SearchEngine-1.1"should make the Plucene installation straight-forward. Regarding document parsers ( 3rd party tools ) :
DOC.pm file
ExtraBackendParsers.zip provided by SopanShewale
DOC.pm. Change lines 1, 3, 8, 12 & 19 with corresponding extension, mime type and external tool and you will get a brand new parser.
-- JoanMVigo - 03 Mar 2006
Sopan and Joan: I added both of you to the TWikiCommunityGroup so that you can move/delete content. Please review the notes on that group topic.
-- PeterThoeny - 04 Mar 2006
Thanks, Joan, for the explanations. The installation of Plucene and its search engine is straightforward, but takes quite a time on slim installations (like the VM engine) due to the list of dependencies - much longer than installing the plugins and its extra parsers together.
Two notes on the ExtraBackendParsers: Spreadsheet::ParseExcel, which seems to croak on the Excel2003 files we use in our office. On the other hand, xlhtml does the trick.
Doc.pm seems to go an extra loop by converting .doc to .pdf, and then .pdf to .html. Is the result better than directly converting .doc to .html (which the wv package can do as well)?
.doc, Spreadsheet::ParseExcel vs. xlhtml with .xls) with regard to indexing performance, suitability for search?
-- HaraldJoerg - 04 Mar 2006
Hi there!
Thx for the great search engine. I have tried it with the latest Dakar version, but I get only results with topics, which have attachments. No results in "normat" topics. At first, the scripts (plucindex, plucupdate) doesn't work. But I fixed it. So my only problem is, to get a result in "normal" topics. What's my fault?
-- HugoKuegerl - 12 Mar 2006
Hugo, the scripts work fine with the last Dakar release (build 8740). The only changes needed are: $twikiLibPath in your_twiki_path/plucene/bin/LocalLib.cfg
plucene/logs for problems indexing topics/attachments.
Regarding your searches, topics within webs with NOSEARCHALL = on are not displayed.
-- JoanMVigo - 13 Mar 2006
I installed the latest version on TWiki-4 and run into some problems summarized at PluceneAddOnIssues.
-- PeterThoeny - 15 Mar 2006
Hi all. I have just uploaded a new release of this add on which solves an issue while updating when topics have similar names: TestTopic1, TestTopic2, TestTopic3, ...
Also, PLUCENEINDEXTENSIONS TWiki variable values have changed. Each extension needs a DOT before it. Just type Set PLUCENEINDEXEXTENSIONS = .pdf, .html, .txt, .doc as in older versions. Sorry for the incovenience.
Thanks to PeterThoeny for bringing these issues to light. Also to HugoKuegerl for discovering a bug in index/update operations (Dakar indexing was always reading first version topic texts!!!)
-- JoanMVigo - 21 Mar 2006
I've been helping set up the plucene search add on to an experimental twiki installation. Everything was going swimmingly. We could index all the attachments we were interested in (except .pdf files, though since I installed pdftotext that should be fixed, too). Then, this morning, we found that all searches came up empty. The indexer appears to work normally. The logs look good. But there are no search results. Has anyone else seen this?
-- DavidHoughton - 20 Apr 2006
I installed plucene today and I'm facing the same problem as David. There are no search results.
-- AlokNarula - 11 May 2006
Is there any apache configuration needed to enable plucsearch? I'm getting no search results eventhough my index is generated perfectly. However the Apache error log says this:
Don't know how to turn into an index reader at /home/twiki/bin/plucsearch line 209, referer: http://localhost.localdomain/twiki/bin/view/TWiki/PluceneSearch/your_twiki_path/plucene/index )
plucsearch by default may be executed as user nobody/TWikiGuest. Please, check this! Also, consider that only allowed topics for the authenticated user may be displayed as results.
If you have user authentication enabled, you should add the following lines to /twiki/bin/.htaccess if using Apache login module
<Files "plucsearch">
require valid-user
</Files>
Otherwise, if using Template login module, launch /twiki/bin/configure script in your web browser and append plucsearch to {AuthScripts}
-- JoanMVigo - 08 Jun 2006
I have appended plucsearch to {AuthScripts} but plucsearch is unable to search restricted webs eventhough the index has been generated correctly. plucsearch works fine with public webs. What is the problem?
-- AlokNarula - 13 Jun 2006
Same here with restricted webs. I created a new topic in the Sandbox with an attachment and was able to get search results from both the topic and the attachment. I never get any results from the one web I have with restriced access. For example if I comment out Set ALLOWWEBVIEW = InformationServicesGroup then I get results. I have logged in with several accounts all of which are a part of the InformationServiceGroup and get the same results.
-- GordonTerrell - 19 Jun 2006
I have checked it and finally the problem is that: plucsearch script gets the user from the SESSION object exposed by the TWiki fuync. module: my $remoteUser = $TWiki::Plugins::SESSION->{remoteUser}; and ...
/twiki/bin/.htaccess configured to authenticate the plucsearch as described above (see my comments 08 Jun 2006), remoteUser is the one you typed, so the results are displayed ok, even with restricted webs, however ...
/twiki/bin/.htaccess, remoteUser is always the user guest even if you are authenticated using TemplateLogin and plucsearch appears in {AuthScripts}, so any restricted web's results are never listed.
plucsearch script and changing line 58, replacing the old one my $remoteUser = $TWiki::Plugins::SESSION->{remoteUser}; with this new one my $remoteUser = $TWiki::Plugins::SESSION->{user}->{login}; will solve this problem, and the plucsearch script will always work, regardless which auth setup you have chosen.
Once again, I am sorry for the delayed reply.
-- JoanMVigo - 21 Jun 2006
Thanks Joan. plucsearch can now find text in the restricted webs. Perhaps you can modify the plucsearch script and upload the latest plugin to TWiki.
-- AlokNarula - 21 Jun 2006
The template authentication search bug has been solved and a new release of this plugin is available in the add on topic. Thanks to AlokNarula and GordonTerrell for discovering the issue and for providing feedback.
I also have fixed a bug updating the index: due to partial topic name search enhacement, old topics were not removed from index. Whenever possible, replace plucsearch and plucupdate with latest versions (for Cairo version, only plucupdate replacement is required).
Finllay, the addon was tested succesfully on latest Dakar release.
-- JoanMVigo - 27 Jun 2006
The plugin works well now, all my files are indexed but i don't understand why plucene don't search into ppt files. However there are indexed by plucupdate. It's strange. Thanks for yours answers
-- EmmanuelMaatouk - 11 Jul 2006
Emmanuel, to have ppt files indexed, you need to: ExtraBackendParsers.zip -- follow README file instructions
ppthtml, which is part of the xlhtml package ( available at http://chicago.sourceforge.net/xlhtmlif (($indexextensions{".$extension"}) should be if (($indexextensions{"$extension"})
-- GlennRoberts - 10 Oct 2006
Either that, or include a "." when listing the extensions. Plugin topic suggests extensions should be listed without a ".", Dev-topic suggests they should be included - I guess this Dev topic should be refactored and main plugin topic updated accordingly (.. when we get the time) :-).
-- SteffenPoulsen - 10 Oct 2006
Hello, I need help using Plucene IndexSearch.
It all works fine with my TWiki, but I want to attach ComplexHTML Documents, containing sub folders, to topics.
How do I have to change the Plucindex Code and which files I have to modify so that all files in the sub folders will be indexed too?
Has someone any ideas?
I thought about adding a line to the file twiki/bin/plucindex, round about line 250 in the method:
foreach my $attachDefP (@attachmentList){...
#process file
Plucene::SearchEngine::Index::File->examine(..) # adding such a line with the Subfolder to examine,
but it was only a guess of mine and I couln't get it work.
For any hints, ideas or solution I will be thankful.
-- DanielWiechmann - 12 Oct 2006
Yes, you're right. Dev-topic is not fully updated as the zipped topic included in the release. You should append dot to your extensions as in
* Set PLUCENEINDEXEXTENSIONS = .pdf, .htm, .html, .txt, .doc
Regarding ComplexHTML Documents, if the topic meta has information regarding those files, the plucindex should process them. Which extension you use to attach those Complex files?
-- JoanMVigo - 16 Oct 2006
Yesterday, i had installed the latest version of the Plucene Search-engine and the Add-On for Dakar. After the installation of the cpan-modul the engine seems to be working. But totay i have problems. I changed the PLUCENEINDEXEXTENSIONS to .pdf and so on. Now the plucindex create a huge amount of errors while he is indexing the attachments. The Error is
Parsing of undecoded UTF-8 will give garbage when decoding entities at /usr/lib/perl5/vendor_perl/5.8.8/i586-linux-thread-multi/HTML/Parser.pm line 102In addition there is another problem. When i'm looking for a topic with an attachment, i looks good and work fine. But when i'm looking for a topic without an attachment, there is no result, although the PLUCENESEARCHATTACHMENTSONLY is 0. All the topics of the web are indexed. -- MichaelWeber - 18 Oct 2006 Hey Joan, I explain you what I mean with these Complex HTML Documents.It is based on one html file and one folder. In this folder are additional html files which are referred from the one file.I copy (do not use the upload function) the file and the folder into that folder, that is generated by a topic (perhaps topic Project in Web Main, so there is the (directory)-structure twiki/pub/Main/Project). Now there will be indexed the one html file but not the files in the "attached" folder, when I run the index-script. Any idea how to modify the plucindex code?? -- DanielWiechmann - 19 Oct 2006 Sopan & Joan, I added a SHORTDESCRIPTION to the "Add-On Info" section so that this add-on is represented properly in the AddOnPackage topic and query topics. Please feel free to take this into the next release. -- PeterThoeny - 04 Nov 2006 Is it possible to implement a "FromThisTopic" option? If selected, SearchEnginePluceneAddOn should only give results from a given topic and the topics where it is linking to. Perhaps with the help of DirectedGraphWebMapPlugin. -- RichardVinke - 11 Nov 2006 I know that this might go beyond the scope of this discussion, but would you please help this newbie on this issue? I am trying to install all the dependencies of this add-on, and I see that it requires wvWare. I got it from wvware.sourceforge.net, but I have no idea on how to install it on a Debian Linux system. Would you be so kind to explain me the steps? Thanks, -- MiloValenzuela - 13 Nov 2006 Milo: download libwmf-0.2.8.4.tar.gz. Use commands: "./configure", "make", "make install". Solve errors before proceeding to the next step. -- RichardVinke - 14 Nov 2006 Thanks for the quick response...I am totally new to Debian Linux. I finally installed everything as indicated (all dependencies, etc) and set up all the variables. plucindex runs succesfully however the search returns NO results. I think I correctly defined the variables:
---++ Plucene settings
* Set PLUCENEINDEXEXTENSIONS = .pdf, .htm, .html, .txt, .doc
* Set PLUCENEINDEXPATH = /home/httpd/twiki/plucene/index
* Set PLUCENEATTACHMENTSPATH = /home/httpd/twiki/pub
* Set PLUCENESEARCHATTACHMENTSONLY = 0
* Set PLUCENESEARCHATTACHMENTSONLYLABEL = Display only attachments
* Set PLUCENEINDEXVARIABLES = CONTACTINFO, JUSTANOTHERONE
* Set PLUCENEINDEXSKIPWEBS = Trash, Sandbox
* Set PLUCENEINDEXSKIPATTACHMENTS =
* Set PLUCENEDEBUG = 1
Any ideas?
Thanks!
-- MiloValenzuela - 14 Nov 2006
Milo: What do the log files say about the plucindex? I saw several questions about this problem (above
), perhaps the answer is here.
-- RichardVinke - 15 Nov 2006
I had a similar problem - see my post above on 10 Oct 2006.
-- GlennRoberts - 15 Nov 2006
I installed all the dependencies as suggested above. The plucindex is succesful. It indexes all the expected attachments succesfully. However, the searches only work within topics. It seems that its not searching within the indexed documents. I don't think its Glenn's issue cause the indexing was succesful. The plucene logs present only the indexing log (which seems succesful) and the Apache error log doesn't mention plucene. Is there any other way to debug this plugin?
-- MiloValenzuela - 15 Nov 2006
I have a problem. It does not seem to search the content of form fields. Should it? I have topics with a form attached, and it does find words in the topic text and even (with form:MyForm) the form name, but it does not find text in the form entries. Neither with searching for text nor FormField:text. Is this a local problem on my setup, or does it ignore form fields? And how can I start debugging (the index seems ok)? Otherwise it's a great plugin, thanks!
-- StephanMatthiesen - 04 Feb 2007
Correction: it does search form fields now, but only with the search string of the form FormField:text, but this is only possible when I know the name of the field.
Is it possible to have a general search where it doesn't matter if the search terms is in the topic text or in a form field? Forms are quite important to structure the TWiki, so it would be a shame if they are not really included in the search.
-- StephanMatthiesen - 04 Feb 2007
My plucene search never worked fine for searching within attachments...
it might be my uber newbieness with Linux, but I can't seem to cd into the plucene/index directory (which I suspect the search engine can't enter). If I do a "ls -l" it displays the directory but when I attempt a cd into it, it gives me the "no such file or directory" error. Any thoughts or recommendations?
-- MiloValenzuela - 03 Mar 2007
The execution flag is possibly not set on the directory. Try a: chmod 775 index
-- PeterThoeny - 05 Mar 2007
Peter, it tells me that it "cannot access 'index' : No such file or directory". It seems weird to me cause the ls -l lists the directory with chathe following characteristics: drwxr-sr-x 2 root www-data 8192 2007-03-01 19:11 index...but it cannot do anything with it. Is there such a thing as a "corrupted" directory in Linux or again its just my newbieness? Thanks, -- MiloValenzuela - 08 Mar 2007 I am using antiword with the module attached below to index *.doc files. I am wondering if any once else has run into the issue of indexing RTF formatted files w/ a *.doc extension. Word opens them just fine. Antiword on the other hand, doesn't seem to know how to handle this datastream. -- BrianGupta - 12 Mar 2007 Hello Milo, try this
* Set PLUCENEINDEXPATH = /home/httpd/twiki/plucene/index/
* Set PLUCENEATTACHMENTSPATH = /home/httpd/twiki/pub/
without spaces at the end of the line. That should fix your problem.
I have another question:
Is it possible to search over directories, that you bind in your topics? For example:
[[file://Server/yourDirectory/]]How do I have to modify this Addon to do this. Or are there any other Addons who can do this? Thanks for your help. -- JoergSchoenknecht - 17 Apr 2007 Hi, When I run ./plucindex, I am getting the following error. Can't locate Plucene/Document.pm in @INC
--- ./plucene/bin/plucupdate 2006-06-27 10:39:50.000000000 +0200
+++ .././plucene/bin/plucupdate 2008-06-06 10:34:13.000000000 +0200
@@ -20,6 +20,7 @@
BEGIN { unshift @INC, '.'; require '../../bin/setlib.cfg' }
use TWiki;
+use TWiki::Func;
use Time::Local;
@@ -94,36 +95,34 @@
$debug && print "Checking $web ...";
- # NOTE violates store encapsulation, possible compatibility issue with future releases
- my $changes= $TWiki::Plugins::SESSION->{store}->readMetaData( $web, 'changes' );
- my $prevLastmodify = $TWiki::Plugins::SESSION->{store}->readMetaData($web,'plucupdate') || "0";
+ # Get the last time we indexed this web
+ my $lastmodifyDir = TWiki::Func::getWorkArea("Plucene");
+ my $prevLastmodify = 0;
+ if ( open(LAST, "<$lastmodifyDir/$web") ) {
+ my $prevLastmodifyTainted = ;
+ close LAST;
+ if( $prevLastmodifyTainted =~ /^(\d+)$/ ) {
+ $prevLastmodify = $1;
+ }
+ }
my $currLastmodify = "";
# do not process the same topic twice
my %exclude;
+ my $changes = TWiki::Func::eachChangeSince( $web, $prevLastmodify );
# process the web changes
- foreach( reverse split( /\n/, $changes ) ) {
- # Parse lines from .changes:
- #
- my ($topicName, $userName, $changeTime, $revision) = split( /\t/);
-
- if( ( ! %exclude ) || ( ! $exclude{ $topicName } ) ) {
- if( ! $currLastmodify ) {
- # newest entry
- $time = &TWiki::Func::formatTime( $prevLastmodify );
- $currLastmodify = $changeTime;
- if( $prevLastmodify eq $changeTime ) {
- # newest entry is same as at time of previous update
- $debug && print "-> no topics new/changed since $time\n";
- last;
- }
- $debug && print "-> changed topics since $time:\n";
- }
- if( $prevLastmodify >= $changeTime ) {
- # found item of last update
- last;
- }
+ $time = &TWiki::Func::formatTime( $prevLastmodify );
+ if( $changes->hasNext() ) {
+ # We have some changes
+ $debug && print "-> changed topics since $time:\n";
+ while( $changes->hasNext() ) {
+ my $change = $changes->next();
+ my ($topicName, $userName, $changeTime, $revision) = @{change}{qw/
+ topic user time revision/};
+
+ $currLastmodify = $changeTime;
+ next if defined $exclude{ $topicName };
$exclude{ $topicName } = "1";
$debug && print " * $topicName\n";
push( @topicsToUpdate, [ $web, $topicName ] );
@@ -133,11 +132,18 @@
push( @topicsToUpdate, [ $web, "WebHome" ] );
}
}
+
+ if ( open(LAST, ">$lastmodifyDir/$web") ) {
+ print LAST $currLastmodify;
+ close LAST;
+ $debug && print "$lastmodifyDir/$web saved\n";
+ } else {
+ warn "Couldn't update $lastmodifyDir/$web: $!";
+ }
+ } else { # No new changes
+ $debug && print "-> no topics new/changed since $time\n";
+ $currLastmodify = $time;
}
-
- # NOTE violates store encapsulation, possible compatibility issue with future releases
- $TWiki::Plugins::SESSION->{store}->saveMetaData( $web, 'plucupdate', $currLastmodify );
- $debug && print "$web .plucupdate saved\n";
}
if (@topicsToUpdate > 0) {
--- ./plucene/bin/plucindex 2006-03-21 10:01:33.000000000 +0100
+++ .././plucene/bin/plucindex 2008-06-05 18:26:34.000000000 +0200
@@ -37,6 +37,7 @@
my $debug = ! ( @ARGV && $ARGV[0] eq "-q" );
# Log stuff: opening the log file
+use TWiki::Func;
my $time = TWiki::Func::formatTime( time(), '$year$mo$day', 'servertime');
my $logfile = "../logs/index-".$time.".log";
@@ -118,8 +119,15 @@
$logtime = TWiki::Func::formatTime( time(), '$rcs', 'servertime' );
print LOGFILE "| $logtime | Indexing web | $web | |\n";
- # NOTE violates store encapsulation, possible compatibility issue with future releases
- $TWiki::Plugins::SESSION->{store}->saveMetaData( $web, 'plucupdate', time() );
+ # Saves the last update run for this web
+ my $lastmodifyDir = TWiki::Func::getWorkArea("Plucene");
+ if ( open(LAST, ">$lastmodifyDir/$web") ) {
+ print LAST time();
+ close LAST;
+ $debug && print "$lastmodifyDir/$web saved\n";
+ } else {
+ warn "Couldn't update $lastmodifyDir/$web: $!";
+ }
# get the list of topics
my @topics = TWiki::Func::getTopicList( $web );
Hope this helps.
-- OlivierRaginel - 28 Jul 2008
Heya Olivier - I've already enabled your commit access
I recon check it in and release it
-- SvenDowideit - 02 Aug 2008
Sven, if I can figure out where I can modify it, I'd commit it right away, but for now, what's in the trunk are the modifications MichaelDaum made.
No idea where I can find this source to tweak it.
Also, as MichaelDaum pointed out, TWiki::Func::eachChangeSince only exists since 4.2.0, right? So we need to "fork" this per version of TWiki.
-- OlivierRaginel - 05 Aug 2008
Olivier, I merged your fixes to the changes I made in a way that no fork is needed. I will upload my changes asap.
-- MichaelDaum - 05 Aug 2008
I had to make a few additional fixes on plucsearch to handle permissions, but I think your version is safe Michael. I can send you my changes if you wish, or check the channel logs, as I was doing this for gordho on Tuesday, 5th August 2008, around 7pm CEST.
-- OlivierRaginel - 08 Aug 2008
I added in the code updates in plucupdate, but I could not get it to work. Instead I will just run the plucindex at 4AM each day. Hope it won't take to long when our content grows with the wiki empty it take about 2 minutes.
-- GregNeugebauer - 2009-10-14
I should note the above was on TWik 4.3.2
-- GregNeugebauer - 2009-10-14
I did run the plucene in version TWiki-5.1.2, Sun, 07 Oct 2012, build 23565, Plugin API version 1.4
Subsequently I have documented here all necessary changes to files.
-- JavierFernandezSanchez - 2012-10-30
Javier, where have you documented this? We appreciate help in doc improvements.
-- PeterThoeny - 2012-10-30
Start with documentation.
[root@twiki ~]# lsb_release -a
LSB Version: :core-4.0-ia32:core-4.0-noarch:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-ia32:printing-4.0-noarch
Distributor ID: CentOS
Description: CentOS release 5.8 (Final)
Release: 5.8
Codename: Final
-- JavierFernandezSanchez - 2012-10-31
yum repolist
yum --enablerepo=* repolist
yum install gcc make links rcs
rpm -Uvh http://epel.mirrors.arminco.com/5/i386/epel-release-5-4.noarch.rpm| I | Attachment | History | Action | Size | Date | Who | Comment |
|---|---|---|---|---|---|---|---|
| |
DOC.pm | r1 | manage | 0.6 K | 2004-12-15 - 11:09 | UnknownUser | Index DOC files with Plucene::SearchEngine::Index::DOC.pm & antiword |
| |
ExtraBackendParsers.zip | r1 | manage | 3.5 K | 2004-12-08 - 14:36 | SopanShewale | Backend Parsers to parse MS Word, Excel, PPT files. |
| |
ScriptsWithLoggingFeatures.zip | r1 | manage | 8.1 K | 2005-07-08 - 12:52 | SopanShewale | Scripts with Logging stuff |
| |
plucenscriptpatches.zip | r1 | manage | 2.0 K | 2005-03-23 - 12:17 | SopanShewale | The patches for partial-topic name search |