Bug: %SEARCH{}% does not handle correctly topic names containing non-alphanum chars
If the
topic name in
%SEARCH{}% command contains non-alphanum chars (for exemple "-", or "ד", ...), topic(s) are not returned by
SEARCH.
CannotExcludeL10NTopicFromSearch exposes the same kind of problem, about
the
excludetopic parameter and "non english words".
Test case
%SEARCH{ "blabla" topic="Foo001-001" nosearch="on" format="found: $topic"}%
doesn't return anything even if page (topic) "Foo001-001" exists and contains "blabla",
while
%SEARCH{ "blabla" topic="Foo001*" nosearch="on" format="found: $topic"}%
returns "found: Foo001-001".
(in my case, "-" are authorised in wiki pages because they represent part numbers, but
the problem is the same with "non english" chars : יטאד ... as mentionned in
CannotExcludeL10NTopicFromSearch).
Environment
--
NicolasRaibaut - 17 Jan 2006
Impact and Available Solutions
Follow up
Fix record
The problem comes from
_makeTopicPattern subroutine in
lib/TWiki/Search.pm.
This subroutine, used to reformat searched topic names and excluded topic names, assumes
topic names are strictly made of alphanumeric chars [A-Z][a-z][0-9] :
s/[^\*\_$TWiki::regex{mixedAlphaNum}]//go;
All other chars are stripped.
--
NicolasRaibaut - 17 Jan 2006
(partial) workaround : add in the
s/[^\*\_$TWiki::regex{mixedAlphaNum}]//go; command "authorised chars"... (but this does not solve problems exposed in
CyrillicWikiWordError and
GermanUmlauteBreakWikiWords).
Ideally,
_makeTopicPattern should :
- suppress only forbidden chars specified in $securityFilter in TWiki.cfg,
- and/or manage the character set autorised in WikiWords (in the same way as
lib/TWiki/Render.pm, after applying patchs proposed by CyrillicWikiWordError and GermanUmlauteBreakWikiWords, but this should be a little bit more complicated...)
--
NicolasRaibaut - 19 Jan 2006
Discussion