Does TWiki need a Parser or Lexer?
As I have worked with tools that try to analyze TWiki topics - e.g. see
EditTopicTreeStructure,
DanglingLinksToolNeeded - I have remarked that there is no way that I can parse or lex TWiki markup language in general, without knowing the semantics of all the plugins and other tools.
Maybe TWiki Markup should have a grammar and structure that can be lexed and parsed?
Although there are standards and conventions, such as %SomeFunction{args...}%,
apparently any TWiki plugin can add its own ad-hoc notations.
Some of which are useful, such as

for smileys.
Each plugin scans the text using regexps.
Several plugins (if not most) look for patterns such as
s/%EmbedTopic{.*}/.../
- the use of such regexps means that it is "legal" (in the sense that it is allowed by the Plugin)
to do things such as
%EmbedTopic{gg{}%
which creates the page "gg{".
Things get worse when such ad-hic notations are combined with other plugins, such as
TreePlugin, which act like macros. TREEVIEW can be passed a formatting argument which expends to invocations of other plugins. Therefore, something like the following is desired
%TREEVIEW{format="%EmbedTopic{$topic}%"}%
i.e. %EmbedTopic{$topic}% will be passed as an argument to TREEVIEW.
But if TREEVIEW uses a /%foo{.*}/ regexp, things will break.
Obviously, TREEVIEW needs some notion of quoting - some way of requiring things like braces that are unquoted to be balanced,
but which may be imbalanced in quotes. I.e. TREEVIEW, and probably other plugins and tools, need some way of parsing TWiki markup,
including for plugins that they do not know about.
In
EditTopicTreeStructure I mention problems that I had with quoting arguments passed to TREEVIEW. These problems arose because of
TWiki friendly parsers
It is ironic that I post this, since I have long been an advocate of recursive descent parsers composed of independent modules, rather than of lexers and parsers that are centrally controled, and which constrain notation. You can't create an ad-hic notation in the parser if the lexer obliviates it...
But maybe it is not so ironic, since I have also developed a few systems that try to create formal structures to allow ad-hoc recursive descent parsing systems to be created, which are nontheless well behaved.
Standard parsers and lexers like Lex and Yacc are stupid, in that they require "global" knowledge of all of the operators.
The key insight of
XML is that anyone can parse
XML. You may not know what a constuct like
<foo> <biz/> <bif> ... </bif> </foo>
but anyone can parse it. Balanced expressions are easily found; imbalanced expressions are quoted in a natural way.
Perhaps a TWiki markup parser could be inspired by
XML. Perhaps - this is probably too far out, but I mention it because it seems
right - TWiki markup could be mapped to
XML, so that
XML-like extensible parsing could be applied. E.g. perhaps TWiki's % oriented syntax could be mapped in a standard way to
XML's <>.
(More Bluesky - maybe quote problems can be resolved by saying
who a chunk of text is being quoted for.)
Anyway - in the meantime, I continue to use ad-hoc regexp parsing, that may or may not match what the plugins in question are actually doing.
--
Contributors: AndyGlew
Discussion
Many of us have been working in similar directions.. one of my ideas is that we start to change the syntax to use
%VARIABLE{}% format for everything - so a plugin such as smiley, would transform
:) into
%SMILEY{':)"}% on save. A similar thing could be done for your xml idea, not that I like it
We keep working around this sort of transformation, but at some stage, we will need to decide if TWiki is to change, or (as it seems is the case at the moment) if those changes would result in
NOT TWiki.
--
SvenDowideit - 27 Mar 2006
Hm, I'm wondering how all the Wikiness introduced by the
TWikiExtensions could be handled by a more formal
XML approach, maybe even using something like
DITA as underlying (or intermediate) standard. Would our beloved Wiki loose its Wikiness? -- Not necessarily, I say.
--
FranzJosefSilli - 27 Mar 2006
TWiki needs
something, since even after finding the language slowdown it's running slow. People who know more about Perl than I do have said that it's the overuse of regexes, which suggests that, yes, TWiki needs a parser.
XML? Bletch.
--
MeredithLesly - 30 Mar 2006
Bugs:Item1571
has a patch against 4.0.0 for "strict" (not "compatible") matching of curly brackets (nested
TML /
TWikiVariables). Tastes a bit like parsing.
--
SteffenPoulsen - 30 Mar 2006