Tags:
create new tag
view all tags

Multiple Searches in Same Topic

A new multiple="on" paramter has been added to %SEARCH{}. Spec

Parameter: Description: Default:
multiple="on" Multiple hits per topic. Each hit can be formatted. The last token is used in case of a regular expression ";" and search Only one hit per topic

See doc in TWikiVariables and FormattedSearch.

Example: Pull all options from the TopicClassification topic and build a select form:

<form action="%SCRIPTURL%/view%SCRIPTSUFFIX%/%WEB%" method="get">
<select name="topic">
%SEARCH{ "META\:FIELD.*TopicClassification.*AdminTopic;\| +option[^\w]+\|" regex="on" nosearch="on" 
format="<option>$pattern(.*\|\s+([A-Za-z\. ]+)\s+\| +option[^\w]+\|.*)</option>" multiple="on" }%
</select>
<input type="submit" value="Go">
</form>

Renders as:

Note: It takes time to render this topic containing above SEARCH because all Codev topics are searched. It will render much faster once SearchTopicNameAndTopicText is implemented, e.g. when the search can be limited to the TopicClassification topic only.

-- PeterThoeny - 29 Sep 2003

Discussions

It seems like there is no capability for searching multiple instances of pattern in a single TWiki page.

Use Cases

  • Multiple patterns such as "Todo: " within a topic can be collected into a single topic.
  • Be able to retain the "Context" in a topic, but use specific patterns to identify information. In another page aggregate info of specific type. For e.g. I may put a lot of information about a person in single page. And then aggregate only phone no.s in another page. There can be multiple phone no.s (of decided pattern such as "(Phone No: xxx (R))" ) in a single page.
  • I have many tasks defined in different projects (= different webs). Within a single topic that explains a particular feature, we might also define tasks generated out that feature, and I might have multiple tasks defined for me for that feature. I would like to see them all in my Home page (a portal page) organized as a tree of projects and features under the projects, and tasks under the features.

What is available today

When you give a search, the following algo is used (FormattedSearch):

  • Output pre-match strings (such as table headers etc.) if any.
  • First identify the scope of search: A select list of webs and topics. Or whether to search for pattern in topic Name or body also. (To select specific subset of topics - say all of type FeatureXyzXyz, first do a search to get only the topic names matching a given pattern, and use the results embedded in another Search.)
  • For each topic:
    • If the pattern match happens, then:
      • Define the variables $web, $topic, $pattern (and others. See FormattedSearch.). Here $pattern is any pattern and not the searched pattern.
      • Visit the Format string, replace the variables with values.
      • Output the result string.
  • Output post-match trailers (if any)

Key thing to note is when the outputs are generated.

Interpretation of Multiple matches within a topic

There are two changes to the algorithm: Note the addition of new loop. Also, the interpretation of $pattern may change.

  • Output pre-match strings (such as table headers etc.) if any.
  • First identify the scope of search.
  • For each topic in scope:
    • If the pattern match happens, then:
      • For each pattern match:
        • Define the variables $web, $topic, $pattern (and others. See FormattedSearch.). Here the suggestion is that instead of $pattern, let us define variables $1, $2, $3 etc. These variables correspond to the perl's usual mechanisms to select specific segments of pattern match.
        • Visit the Format string, replace the variables with values.
        • Output the result string.
  • Output post-match trailers (if any)

Some Analysis on whether we should enhance interpretation of format string

The webs and topics and searched patterns (which are specific segments within a topic) form a tree. The pattern match identifies certain paths from root to leaf node in the tree. Every such path defines the path variables, and at the leaf node, more variables: $1, $2 etc. The formatted output, in most simple case, is a linear list of these paths. This linear list can be mapped to a list element using " * element* syntax, or to a table syntax. However, we might actually want to use more detailed data structures such as trees. For e.g. List inside a list. How best to change the interpretation of format string to accomodate this? For e.g. we might want to define format strings for every change in the level. (e.g. whenever there is a new web, or new topic, or new pattern.) And you require header and footer for every such level change(Think XML!).

Also, note that the tree structure can be induced by the matched data itself. For example, assume that in pattern, we have $1, $2, $3 matching Name, Company, City. So you have linear list of these - with multiple companies in a single city, and multiple people in a single company. You want to produce XML with city as outermost note, and then Company and then Name. How to define format string to produce this type of XML? (The default hierarchy of $web, $topic and $pattern doesn't help in this example becuase the matched data have no relation to topic names or webs.)

In essence, We are interested in implementing a Search with really good handle on inputs or outputs. Before I go on to implement these ideas, I would like to get comments on whether this is a right approach. Also whether there are more interesting use cases.

-- AmitTendulkar - 27 Aug 2002, and VinodKulkarni.

A multiple=on parameter has been proposed for that by AntonioVega in Support.MultiExtractionWithinTopic

-- PeterThoeny - 08 Feb 2003

I just took a look at the solution shown in Codev.FormattedSearchinTopics and looks very very good.

The expectation of implemantation I had in mind was really much like the familiar/actual formated search result except for the multiple="on" atribute (not saying the actual is not good, just to let know my thinking. Note that also multiple="6" might serve to limit the number of hits per topic).

%SEARCH{ "Status=on;Dept=1" 
scope="text" regex="on" nosearch="on" nototal="on" 
multiple="on"
header="|*Subject*|*January activity in Dept 1*|*Category*|" 
format="|$topic|$pattern(.*?January\:[\n\r]*([^\n\r]+).*)|$formfield(Categ)|" }% 

The result may be something like

Subject January activities in Dept 1 Category
Lathe3 Check vibration Machine
Lathe3 Check oil level Machine
Mill2 Check belt Machine
Sara Training in course D023 Guest
Sara Plan trip Guest

After some reading, If I understood it, an equivalent output in this patch for the above example would be:

%SEARCH{ "Status=on;Dept=1" hitformat="|$topic|$hit|$formfield(Categ)|<br>" 
scope="text" regex="on" nosearch="on" nototal="on" 
header="|*Subject*|*January activity in Dept 1*|" 
format="$pattern(.*?January\:[\n\r]*([^\n\r]+).*)"}%

Am I right??? Please comment.

What I can see is that this patch implementation has a way, for each topic that matches the search criteria, to write "prolog" before the multiple matching takes place and also writes an "epilog" after the multile matching took place. If I may, implementing attributes prolog="..." and epilog="..." might do the trick too ???. Those attributes might even have regexp on them if someone needs it (as a matter of fact just like the format sintax). They might exist even if multiple=".." is not specified.

Also, I am not clear if the reacent "and" implementation in search="regexp;regexp" is still functional, and if this patch will be applied in twiki.org. In this way everybody may try it on the Test web.

-- AntonioVega - 07 May 2003
-- AntonioVega - 09 May 2003
-- AntonioVega - 12 May 2003 Extracted from MultiExtractionWithinTopic

I have a need for this feature. The multiple="on" syntax is simple and fits the KISS principle of the TWikiMission. It also is easy to understand for folks who are familiar with grep. So, I favor this syntax over the proposed FormattedSearchinTopics, even though that one would be more flexible.

On "regex;regex;regex" search, we can document that the multiple switch only works on the last regex. That is, the last regex search is repeated, and each line found is returned according to the specified format="...".

In the format="..." we can support any of the existing variables, including multiple $pattern() on the line found. We also can return the whole line, like in a grep search. This is useful to grab all bullets in a topic for example, or all table rows.

  • Question: What is better, use $text to output the whole line found, or a new $line variable?
    • A: Better to use $text since it is easier to document exception to $text in the $multiple help text, then to document "but" cases in many places. -- PeterThoeny - 28 Sep 2003

Now, re-reading above spec of AmitTendulkar and VinodKulkarni I see that they propose something else: Apply exactly one $pattern() which might be different from the search term. For the application where I need the multiple hit feature either syntax would work.

  • Question: Which spec is more useful?
    1. Apply a separate search (Amit spec)
    2. Or use the same search, optionally with multiple patterns on the line found (Peter spec)

I really like the idea of supporting $1, $2, $3, etc. That is a very flexible way to grab more then one thing out of one pattern. It sounds like Perl's s/from/to/ operation. This is a new feature that deserves a separate FormattedSearchWithSubstitution topic.

-- PeterThoeny - 27 Sep 2003

Answering my own question on $text vs. $line variable (see above). Feedback appreciated.

I am currently working on the implementation of this feature, together with FormattedSearchWithSeparatorParameter.

-- PeterThoeny - 27 Sep 2003

This is now in TWikiAlphaRelease and at TWiki.org.

Please help in testing it out. FormattedSearchFormTesting is one way of doing it. Example:

Now, yet another enhancement is needed so that lines of just one particular topic can be searched and formatted: SearchTopicNameAndTopicText. (The same fix is also needed to address the SiteMapIsSlow issue.)

-- PeterThoeny - 28 Sep 2003

Interesting and also moderately confusing. This will need some pretty careful documentation, especially good examples. Making sense of the relavence of SearchTopicNameAndTopicText is tough, although I can see that the scope or virtual web proposal in this topic would add considerable power.

-- JohnTalintyre - 30 Sep 2003

Good examples are needed, yes. I added an example to the top of this topic. I do not understand how/why virtual webs are needed in the context of multiple hits.

Small update is in TWikiAlphaRelease:

  • Support for case sensitive/insensitive search when multiple is on
  • Do not render early $text if multiple is on

-- PeterThoeny - 29 Sep 2003

To make it easier to format results with multiple hist per topic I would like to propose the following additional parameters: topicheader, topicseparator topicfooter and footer.

-- SamHasler - 22 Oct 2003

Sam, better to have one topic per feature. Your request should go into a separate topic.

-- PeterThoeny - 26 Oct 2003

Edit | Attach | Watch | Print version | History: r23 < r22 < r21 < r20 < r19 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r23 - 2020-04-26 - PeterThoeny
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.