Tags: view all tags

Does TWiki need a Parser or Lexer?

As I have worked with tools that try to analyze TWiki topics - e.g. see EditTopicTreeStructure, DanglingLinksToolNeeded - I have remarked that there is no way that I can parse or lex TWiki markup language in general, without knowing the semantics of all the plugins and other tools.

Maybe TWiki Markup should have a grammar and structure that can be lexed and parsed?

Although there are standards and conventions, such as %SomeFunction{args...}%, apparently any TWiki plugin can add its own ad-hoc notations. Some of which are useful, such as smile for smileys. Each plugin scans the text using regexps.

Several plugins (if not most) look for patterns such as

   s/%EmbedTopic{.*}/.../

- the use of such regexps means that it is "legal" (in the sense that it is allowed by the Plugin) to do things such as

   %EmbedTopic{gg{}%

which creates the page "gg{".

Things get worse when such ad-hic notations are combined with other plugins, such as TreePlugin, which act like macros. TREEVIEW can be passed a formatting argument which expends to invocations of other plugins. Therefore, something like the following is desired

   %TREEVIEW{format="%EmbedTopic{$topic}%"}%

i.e. %EmbedTopic{$topic}% will be passed as an argument to TREEVIEW. But if TREEVIEW uses a /%foo{.*}/ regexp, things will break. Obviously, TREEVIEW needs some notion of quoting - some way of requiring things like braces that are unquoted to be balanced, but which may be imbalanced in quotes. I.e. TREEVIEW, and probably other plugins and tools, need some way of parsing TWiki markup, including for plugins that they do not know about.

In EditTopicTreeStructure I mention problems that I had with quoting arguments passed to TREEVIEW. These problems arose because of

TWiki friendly parsers

It is ironic that I post this, since I have long been an advocate of recursive descent parsers composed of independent modules, rather than of lexers and parsers that are centrally controled, and which constrain notation. You can't create an ad-hic notation in the parser if the lexer obliviates it... But maybe it is not so ironic, since I have also developed a few systems that try to create formal structures to allow ad-hoc recursive descent parsing systems to be created, which are nontheless well behaved.

Standard parsers and lexers like Lex and Yacc are stupid, in that they require "global" knowledge of all of the operators.

The key insight of XML is that anyone can parse XML. You may not know what a constuct like

    <foo> <biz/> <bif> ... </bif> </foo>

but anyone can parse it. Balanced expressions are easily found; imbalanced expressions are quoted in a natural way.

Perhaps a TWiki markup parser could be inspired by XML. Perhaps - this is probably too far out, but I mention it because it seems right - TWiki markup could be mapped to XML, so that XML-like extensible parsing could be applied. E.g. perhaps TWiki's % oriented syntax could be mapped in a standard way to XML's <>.

(More Bluesky - maybe quote problems can be resolved by saying who a chunk of text is being quoted for.)

Anyway - in the meantime, I continue to use ad-hoc regexp parsing, that may or may not match what the plugins in question are actually doing.

-- Contributors: AndyGlew

Discussion

Many of us have been working in similar directions.. one of my ideas is that we start to change the syntax to use %VARIABLE{}% format for everything - so a plugin such as smiley, would transform :) into %SMILEY{':)"}% on save. A similar thing could be done for your xml idea, not that I like it smile

We keep working around this sort of transformation, but at some stage, we will need to decide if TWiki is to change, or (as it seems is the case at the moment) if those changes would result in NOT TWiki.

-- SvenDowideit - 27 Mar 2006

Hm, I'm wondering how all the Wikiness introduced by the TWikiExtensions could be handled by a more formal XML approach, maybe even using something like DITA as underlying (or intermediate) standard. Would our beloved Wiki loose its Wikiness? -- Not necessarily, I say. smile

-- FranzJosefSilli - 27 Mar 2006

TWiki needs something, since even after finding the language slowdown it's running slow. People who know more about Perl than I do have said that it's the overuse of regexes, which suggests that, yes, TWiki needs a parser.

XML? Bletch.

-- MeredithLesly - 30 Mar 2006

Bugs:Item1571 has a patch against 4.0.0 for "strict" (not "compatible") matching of curly brackets (nested TML / TWikiVariables). Tastes a bit like parsing.

-- SteffenPoulsen - 30 Mar 2006

BasicForm
TopicClassification	BrainstormingIdea
TopicSummary
InterestedParties
RelatedTopics

Topic revision: r6 - 2006-03-30 - RafaelAlvarez

Account
- Log In
- Register User

Edit
Attach

Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.