Tags:
create new tag
view all tags

Question

My users have entered all their data, such as telephone numbers, email addresses, messenger ids. I now want to create a search that extracts this information and shows it as a table. Here is what I have so far:

%SEARCH{"Email", web="Main", regex="on", header=" | *Person* | *Mobile* | *Email* | *Messenger* | |", format=" | $topic | $pattern(.*?\*.*?Company Name\:\s*([^\n\r]+).*) | $pattern(.*?\*.*?Email\:\s*([^\n\r]+).*) | $pattern(.*?\*.*?Country\:\s*([^\n\r]+).*) | $pattern(.*?\*.*?City\:\s*([^\n\r]+).*) | $pattern(.*?\*.*?Hear from\:\s*([^\n\r]+).*) "}%

The problem is that any questions they leave unanswered end up in the output.

Environment

TWiki version: TWikiRelease01Feb2003
TWiki plugins: DefaultPlugin, EmptyPlugin, InterwikiPlugin
Server OS: RedHat Linux
Web server: Apache
Perl version: 5.6.1
Client OS: XP
Web Browser: IE6

-- TWikiGuest - 09 Sep 2003

Answer

I tried the above query on my machine. I could not get it to work here at TWiki.org but the nearest demo that exhibits what the question is asking is as follows. See the line 'TWikiSpam'. In this, the entry shows up as "*Web server" when actually there is no value. However, I don't know why. Can anyone help?

%SEARCH{"Email", regex="on", header=" | *Topic* | *TWiki version* |", format="| $topic | $pattern(.*?\*.*?TWiki version\:\w*([^\n\r]+).*) |"}%

Searched: Email", regex="on", header=" | *Topic* | *TWiki version* |", format="| $topic | $pattern(.*?\*.*?TWiki version\:\w*([^\n\r]+).*) |

Results from Support web retrieved at 04:41 (GMT)

Question My users have entered all their data, such as telephone numbers, email addresses, messenger ids. I now want to create a search that extracts this information...
Number of topics: 1

-- MartinCleaver - 10 Sep 2003

This is most probably caused by the \s* after version\:, it scans over white space, including new lines. I have no time right now to investigate, I have to bring my kids to bed...

Searching the Main web at TWiki.org currently does not work, see Codev.ArgumentListIsTooLongForSearch

-- PeterThoeny - 10 Sep 2003

Peter is quite right - the problem is fixed above by using \w instead of \s in the regex. Seeing as the bullet point is so commonly used, requires an unweildy regex, and is too easy to get wrong, it is probably worth adding a specific $bulletextract(string) method to retreive it. This could get turned into the pattern(.*?\*.*?string\:\w*([^\n\r]+).*) before hitting the fomatter.

FormattedSearch uses \s so if you agree you might want to change it to use \w.

I suspect that wacky spacing and the odd '|' in the table content causes the other formatting errors.

-- MartinCleaver - 10 Sep 2003

The other formatting error is caused by two different formats of the support template. We used to have bullets, then switched to a table. The regex can be tweaked to support both formats.

-- PeterThoeny - 10 Sep 2003

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r6 - 2004-01-02 - PeterThoeny
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.