Tags:
create new tag
view all tags

Implemented: Exclude Topics from Search

%SEARCH{}% has a new excludetopic parameter. Specification, as documented in TWikiVariables:

Parameter: Description: Default:
excludetopic="Web*"
excludetopic="WebHome, WebChanges"
Exclude topics from search: A topic, a topic with asterisk wildcards, or a list of topics separated by comma. None

This feature is useful to exclude template topics from showing up in a formatted search. Use FormattedSearchFormTesting to try it out.

-- PeterThoeny - 01 Oct 2003

Discussions

I'm surprised in a way I've not needed this until now, and the Support.ExcludeWebTopicsFromSearch topic doesn't really deal with the issue, so I suspect I'm hitting boundary cases again.

The reason I need it is because I'm using named sections, finding bugs and use cases along the way.

Specifically I'm doing a search:

%SEARCH{ "Status.*[N]ew" limit="10" scope="text" reverse="on" order="modified" nosearch="on" regex="on" nototal="on" format="[[$topic][%INCLUDE{'$topic' section='title'}\%&nbsp;]]<font size=-1>[[%SCRIPTURL%/edit/%WEB%/$topic][edit]] %INCLUDE{'$topic' section='firstparagraph'}\%"}%

Picking some points:

  • The [N] is there fore the usual reason - to exclude the main topic.
  • The   is there to ensure the [[]]] results for pages with no "title" section aren't rendered as [[]].
  • The nested INCLUDE uses single rather than double quotes - this is a bit of a hack at the moment, as is the escaping of the \%

Overall, this is working quite, quite nicely smile (text turned to gibber in example)

However not all pages that contain this search string want to be included. Examples are:

  • A main "section" page - much like the page per topic classification on Twiki.org (eg FeatureBrainstorming)
  • The webform topic with the page in.

Aside from that the result is nice. The clear downside here though is on those pages, neither the title section, nor firstparagraph sections exist. This means a random extra edit box appears - confusing users. Clearly I can ditch this, but I'd rather not.

What I'd like to be able to do is one of the following:

  • Add to SEARCH an extra option that goes:
    • exclude="topic1,topic2,topic3"
    • excludepattern="(topicpat*|topicpat2*)"

I know currently twiki currently does a call out to grep and this could either be implemented as:

  • grep (usual search) (usual files) | grep -v 'pattern' in the call out
  • Take the results which I presume get read into an array at some point and use perl's internal grep on the list to apply the equivalent of -v.

If this has been implemented by anyone I'd appreciate a pointer. If it hasn't comments on appropriate syntax welcome...

-- MichaelSparks - 24 Jun 2003

This would be better implemented in Perl, see SearchWithNoPipe. Search.pm does an external grep search first, then a loop through the topics found where topics could be excluded.

Better to introduce a more specific term, like excludetopics="..."

-- PeterThoeny - 24 Jun 2003

The work-arounds for this problem are a real pain as you have to struggle to find something that will not match the template but will match in everything generated from the template. Adding %NOP% works only sometimes for me. Instead, I made the following changes to the code, which was quite painless, and adds an additional parameter excludetopic="TopicNameOne|TopicNameTwo|etc" so that you can explicitly state which topic files will be excluded from the search. This may be a good idea for a future release of the code as well.

In TWikiDotPm, we need to add the excludetopic parameter to the %SEARCH{...}% variable and to the call of TWiki::Search::searchWeb() ; In SearchDotPm, add the formal parameter; and exclude the topics with the same approach used above by MartinCleaver. I clipped out the code I had here before as I since made an appropriate patch file. Please see excludetopic_patch for details.

This could probably be improved per the comments of SearchWithNoPipe, but it seems the exclusion list would work happily enough. This approach is much clearer and cleaner than continuing to require that we get very tricky with obtuse syntactical tricks, like [ ] and %NOP%.

-- RaymondLutz - 23 Oct 2003

Thanks Raymond. I appreciate it. (It'd have been better with a patch conforming to the PatchGuidelines though - unfortunately my wikis are so far patched from the now ancient BeijingRelease that I can't easily provide the context diffs needed.)

Core Team: I've scheduled this for Cairo Release. Any objections? Any comment?

Raymond: I've also noted this on http://www.owiki.org/twiki/bin/view/Openwiki/ExcludeTopicFromSearch; you'll note that Michael progressed the spec a little, so if you fancy a bit more of a challenge do the final 25% to meet Michael's plan, that looks like a credible direction.

Additionally, if you want, you can create a branch in the CVS on that system and merge it into that distribution yourself.

-- MartinCleaver - 24 Oct 2003

This feature will simplify search, it makes sense to put it into CairoRelease. This feature is complementing SearchTopicNameAndTopicText; both features should be consistent (parameter name and support for list of topics). Since the parameter name for the other feature is topic. I suggest to name this parameter excludetopic. See discussion on list of topics in SearchTopicNameAndTopicText.

-- PeterThoeny - 25 Oct 2003

I am happy to see that this idea is being received positively as it means I was not missing some other way to do it (nicely).

  • I agree with PeterThoeny on naming the parameter excludetopic as this is more descriptive.
  • When implementing the changes, I noted an improvement that can be made in terms of the partitioning of the code. This was pointed out by the fact that we had to modify TWiki.pm in order to add a parameter. Currently, TWiki.pm parses the parameters to %SEARCH{}% and then passes each one as an actual parameter to the Search.pm function that handles the expansion. Instead, the entire content of the curly brackets should be passed to the Search.pm package and let it parse the parameters as it sees fit. I am just getting started getting to know the TWiki code, (although I have coded extensively in Perl for years) so I don't know if this approach is already standard for plugins and extensions, but I would be surprised if it isn't as TWiki.pm would have to be changed for each new feature and each new parameter than will undoubtedly be added over time. This will take a bit more "refactoring" and I didn't want to attempt it in a patch and without having more experience with the progression of this code.
  • Regarding the "other 25%", I don't agree that it is necessary to complexify this by adding that 25%, i.e. the list of topic names separated by commas. It seems to me that the regex approach will handle all the cases needed, and serves to also exclude the topics based on name if the names are simply piped together. I thought about the code again and I don't think it falls into the problem described by SearchWithNoPipe as the exclusionary filtering is done to the list after it is first built using the system grep call. The grep used here is strictly a Perl affair and does not suffer from the same syntactical issues broached in that discussion. The only drawback to this is trying to simplify the syntax for non-regex users.
  • I went ahead and generated a patch according to PatchGuidelines. Thanks for the direction on getting that done. In the process, I made the name of the parameter excludetopic. I have attached the excludetopic_patch for your convenience. Use the following command to process the patch from the twiki directory (lib is subdir):
    patch -i excludetopic_patch -p0
Enjoy.

-- RaymondLutz - 28 Oct 2003

What a coincidence, yesterday I started to work on this feature among SearchTopicNameAndTopicText, SearchWithNoPipe, ArgumentListIsTooLongForSearch, InlineSearchArgListTooLong. So you will see a somewhat different implementation. Stay tuned.

We can't pass the whole parameter string to Search.pm since it is called also from the bin/search script by individual URL params.

I am wondering, should we name this feature simply exclude as originally proposed, or use the more descriptive excludetopic name? Opinions from others?

-- PeterThoeny - 29 Oct 2003

Regarding the calling scenario, the bin/search call should simply call an different entry point that extacts the parameters from the query string. No big deal, and eliminates the long list of parameters. The idea is to program in an object oriented fashion, regardless of whether you use customary object-oriented syntax. The variables that are extracted in bin/search don't belong there either as it does nothing with them except to pass them to the function.

I used your proposed excludetopic naming in the patch I provided, which I think is a bit more descriptive, albeit longer.

I have been using this in some of my database-like implementations, and I find that I have to list many of the topics that would otherwise be accidently listed in the search results, such as:

  • the template used for new topics using the form.
  • the form used for the topics
  • the WebPreferences topic if it includes the name of the form
  • any topics that are option lists for the form
  • the actual topic that is doing the search (self)

These are easily excluded by listing the topic name piped together in the excludetopic parameter, such as

excludetopic="TaskTopicTemplate|TaskDataForm|WebPreferences|AssignedTo|JobCategory|TaskList"

The topics (in this case AssignedTo and JobCategory) are option lists of the TaskDataForm and do not explicitly show the word TaskDataForm within the topic body, but they are children of the TaskDataForm, and so they are found by the search. This is because the file contains %META:TOPICPARENT{name="TaskDataForm"}%. These TOPICPARENT and perhaps TOPICINFO meta fields should be considered to be standard exclusions from the search. This would reduce the number of topics that have to be listed and would help users understand what is going on, as those hidden fields are not clear. I have a better idea I think, below, that solves this too.

When I perform a search using the search box, (i.e. bin/search ) then I notice I get the same garbage listed as well as valid topics that the users will have some interest in. This brings up the question about this type of search and whether there should be a way to limit the search away from topics that are mostly administrative in nature, i.e. Forms, Templates, option lists and the like. I'm thinking that a better approach would be to have a variable that could be placed in the topic that would exclude it from most searches. In the search dialog, we could add a checkbox that would search everything.

PeterThoeny, you may have a good idea of the best way to do this. I would imagine we would want to include

  • Set NOTOPICSEARCH = on

in the topic, but I am interested to hear your opinion. Let's list the attributes of this direction:

  • eliminates the complexity of the %SEARCH {}% syntax and avoids long lists of exclusions in a regex string.
  • provides the complete elimination of Forms, Templates, option lists, and other administrivia from any searches, both from %SEARCH and the bin/search dialog.
  • We could provide a means to include these topics in a search dialog if a full search was desired (full search checkbox).
  • Does not provide a means to eliminate the topic from which the %SEARCH is being actioned (finding yourself). I claim that we never need to find ourself in any search of this type, i.e. the current topic of an embedded search should never find itself in the search. This would be a simple addition to the search code. Therefore, this is part of the proposal.
    • I use this feature often: Do a formatted search my own topic for reporting, e.g. to show a summary of important form fields on top. -- PeterThoeny - 01 Nov 2003
  • One final benefit, no added parameter to the search parameter list (although I stand by my assertion that this is mispartitioned.)
  • User Implementation: easy -- whenever any page occurs in a search result that makes no sense, adding Set NOTOPICSEARCH on= in that page is trivial and clear, eliminating administrivia pages from search results.
  • Avoids requiring the confusing [T]ricks and %NOP%'s that are certainly hard for a beginner to understand.

If this seems like an appropriate direction to take, I would be willing to create the code to implement these changes.

-- RaymondLutz - 29 Oct 2003

This is now implemented among SearchTopicNameAndTopicText, SearchWithNoPipe, ArgumentListIsTooLongForSearch, InlineSearchArgListTooLong, SearchWebHasTooManyParameters, SiteMapIsSlow. At the same time I did some code refactoring. This is the reason why I did not apply Raymond's patch (no offense to your patch).

The NOTOPICSEARCH topic setting is an interesting idea. I do not think that we need this feature since the excludetopic parameter and %NOP% variables cover it.

Is in TWikiAlphaRelease and at TWiki.org. TWikiVariables docs are updated.

-- PeterThoeny - 01 Nov 2003

...I just realized I'd forgotten to say anything when this went in. I've been wanting it for a loong time; so: Thank you! smile .

-- MattWilkie - 03 May 2004

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatext excludetopic_patch r1 manage 3.2 K 2003-10-28 - 21:22 UnknownUser  
Edit | Attach | Watch | Print version | History: r14 < r13 < r12 < r11 < r10 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r14 - 2006-01-03 - PeterThoeny
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.