Tags:
create new tag
, view all tags
[^a-zA-Z0-9] is (was?) included in =%SEARCH{ "^%TOPIC%[^a-zA-Z0-9]" to limit matches to strings that match the complete search pattern and nothing more. In other words, if %TOPIC% is currently "ThisTopic", the pattern will match the string "ThisTopic" but not, for example, "ThisTopicAsWell".

This was a workaround because, at one time, the $ anchor did not work in TWiki. This has since been fixed, and now the desired result can be achieved with "^%TOPIC%$". (And, in fact, the original "^%TOPIC%[^a-zA-Z0-9]" does not accomplish quite the desired result -- quoting PeterThoeny from Support.DisplayFormFieldinTopic:

"^%TOPIC%[^a-zA-Z0-9]" is not the correct regex to find exactly %TOPIC% without leading or trailing characters. This is because the regex finishes scanning with a match at "not followed by a alphanumeric character".

"^%TOPIC%$" is the correct regex. Note that this requires the CantAnchorSearchREToEnd fix.

-- PeterThoeny - 01 Aug 2002

From the same page, here is the original inline search that sparked the question:

  • You type: %SEARCH{ "^%TOPIC%[^a-zA-Z0-9]" scope="topic" limit="1" regex="on" nosearch="on" nototal="on" format="$formfield(SupportStatus)" }%

Aside: "Regex" is an abbreviation for "regular expression". A regular expression is sometimes called a "pattern".

RichardDonkin provided the following good explanation, but it took me a little while to understand:

The [^a-zA-Z0-9] matches a single non-alphanumeric character - the idea is to only find ThisTopic not ThisTopicAsWell, since the pattern ThisTopic[^a-zA-Z0-9] will only match the former. Have a look at a regular expression tutorial and try writing some expressions yourself - much easier to experiment yourself as part of learning.

So I provide a little more explanation:

The "^" in a regex can mean two things:

  • at the beginning of a regex (only) it serves as an "anchor" to require that the regex start matching at the beginning of the string being searched
  • elsewhere in a regex, it means "not"

In the listed regular expression, "^%TOPIC%[^a-zA-Z0-9]", "^" is used twice, one each way.

The string represented by the TWikiVariable "^%TOPIC% serves as part of the regex -- the "^" forces that portion to match at the beginning of the target string.

The second part of the regex, [^a-zA-Z0-9] uses "^" as the "not" operator, requiring that the next single character after the TWikiVariable "^%TOPIC% not be an alphanumeric character (not in a-zA-Z0-9), in other words, it must be a whitespace character (space, tab, endline, etc.) or punctuation (.?!/|...).

Open questions -- I think I provided enough explanation about the [^a-zA-Z0-9] in this regex, but there are other related questions that I am not sure about:

  • I understand that TWiki does some magic as far as recognizing the plural form of a WikiName and referring it always and only to the singular form of the WikiName. (I may have that backwards.) I'm sort of surprised I don't see anything in this regular expression to deal with that (like something that searches for alternate WikiNames ending in "s", "es", "ies" instead of "y", etc. (maybe there is no etc.). I'm not really clear about the exact purpose of this code, and this may be irrelevant.

Update: I've seen some more about the plural business since I wrote the above -- TWiki does deal with the plural of various word forms including those ending in y and some other strange ones, but does not deal with plurals in foreign languages. Also, it only deals with plurals in certain circumstances -- IIRC, you click on a plural TWikiWord (like RegexExpTWikiEndAnchorWorkarounds), it links to the page with the singular form as title -- RegexExpTWikiEndAnchorWorkaround. (I'll have to do a test some time and see what happens if a page with the plural form of the name exists.)

See AboutThesePages.

Contents


Rants

See MyRantings.

I feel like I should rant here, just because if I don't surely somebody else will.

I can think of two questions that someone might ask after reading this page:

  • Why in the world couldn't the "inventor(s)" of regexes come up with two different symbols instead of using "^" two different ways? (Same question for "$", and maybe others)

  • Why in the world do people think Linux / Unix / Open Source is a good thing if it includes stuff like this?

Do I have any good answers? Not really. I can probably develop a scenario that paints a logical picture of how these things might have come to be, but why I'm trying to learn this and similar stuff sometimes seems beyond me. If Windows doesn't have this kind of stuff (which it does and doesn't -- I won't elaborate except to say most "normal" searches in Windows don't and can't use regexes, but regexes can be found in things that run on Windows, like Perl, Cygwin (IIUC), etc.), why wouldn't Windows be the preferred thing?

I guess the answer for me is that I strongly believe that, for the sake of all of us, Windows needs viable competition, and Linux / Open Source seems to be the only potential viable competitor for Windows at this time.

Why do I think Windows needs viable competition? Is it just price (the continued escalation of Windows price and license restrictions)? Why don't I feel the same way about the price of (medical) drugs? Well, actually I do, but I won't get into that here.

Contributors

  • () RandyKramer - 01 Jun 2002
  • <If you edit this page: add your name here; move this to the next line; and include your comment marker (initials), if you have created one, in parenthesis before your WikiName.>

Page Ratings

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r4 - 2002-08-01 - RandyKramer
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by PerlCopyright 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding WikiLearn? WebBottomBar">Send feedback
See TWiki's New Look