Tags:
create new tag
view all tags

Brainstorming: TWiki without TML

A lot more people - especially new adopters - are using TWiki with WYSIWYG. Such users are not learning TML (TWiki Markup Language), except when they come to make new TWiki applications, when they are faced with a huge learning curve. TML was a great idea at the time - it allowed users to generate complex markup with minimum learning and very simple tools - but this change to more WYSIWYG in the user landscape means we should reconsider whether it still has such an important role.

One of the main pieces of work I have done over the last couple of years has been the development of the WysiwygPlugin. This plugin provides mappings from TML to HTML and HTML to TML, so I'm in a pretty good position to comment on the limitations of TML. More recently I have been working with a number of former JotSpot customers to port their topics from Jot to TWiki, and this has further highlighted this problem.

Please note that this is not a feature proposal. It is a discussion topic, intended to make developers (existing and potential) think out of the box.

What is TML Rendering?

Broadly speaking, Peter designed TWiki to separate the generation of HTML out into three phases; template instantiation, variable expansion, and TML rendering.
  • template instantiation - is the process of reading and expanding skin templates, and can be thought of as a "compile time" process
  • variable expansion - is the expansion of TWiki variables such as %SEARCH to TML and HTML
  • TML rendering - is the process of recognising TML constructs, such as text used to represent bolded text and | delimited tables, and expanding them to HTML.

This discussion relates to the third of these, TML rendering. The thesis of this article is that template instantiation and variable expansion are the "guts" of TWiki, and that TML rendering was a necessary, but now potentially redundant, step.

What are the problems with TML?

TML is pretty powerful, but it has a number of problems that make it really challenging to convert to and from HTML.
  • Tables. TML tables are flat, and the markup you can have in table cells is extremely limited. Many uses of tables require presentation of complex content in tables. Also, the control you have over the presentation of tables is very limited. The TablePlugin ameliorates this somewhat, but is still constrained by the features the designers chose to incorporate; full CSS support is a long way off.
  • Lack of structure. For maximum usability, TML is freeform - it doesn't require the topic to be parseable, as TML constructs are expanded using regular expressions applied in a highly context-sensitive environment. This lack of structure leads to (thankfully very few) ambiguities, but it still defeats any possibility of creating a context-free parser for the language. This fundamentally limits the efficiency of the language. It also makes it very difficult to subdivide topics into structural element - for example paragraphs, headings, tables etc - a process which is key for improving the editing experience, but also for addressing topic content externally (something a structured wiki ought to be able to do).
  • Store efficiency. As attractive as the flat text files approach is, it's hard to use modern database tools with TML. You have no choice but to store topics flat, which leads to problems searching, and compromises efficiency in the store.
  • Extensibility. Plugins authors often want to make small extensions to TML to enable extra functionality - for example, they may want to add new highlighting types, or new block structures that delimit specialised content. It is very difficult to extend TML in such a way that such specialised content doesn't conflict with extensions made by other plugins. It's also difficult to stop the core from imposing it's own interpretation on specialised content.
  • Portability of content. Because TML constrains what can be done so heavily, importing content from other wikis is a real PITA. Even MediaWiki offers better table formatting support built into the language. Importing HTML requires running through the WysiwygPlugin, which inevitably strips out carefully designed features of the input HTML. Sometimes the only option is to import HTML as HTML, and accept that the result will be read-only topics frown
  • Performance. A significant chunk of topic view time is spent rendering TML.
  • WYSIWYG. Modern browsers are designed with built-in DOM editing tools (MIDAS). It is really, really difficult to leverage these tools when you are restricted to TML.

What is the alternative?

Right now you can write a TWiki application in (almost) pure HTML. Such topics still have embedded TWiki variables (such as %TOPIC%) but don't use TML - except where it is generated as a result of the expansion of the TWiki variable, such as %SEARCH. A TWiki application written in HTML has to be written to avoid any TML constructs, which can be tricky, but it is feasible. This suggests that in many cases, TML can be eliminated from the rendering loop simply by not calling the TML rendering function.

Of course, some TWiki variables generate TML; the most glaring is %SEARCH, which generates its output this way. However it would be fairly straightforward to convert this to generate template-driven HTML; indeed, the opportunity could be taken to fix this when ResultSets are implemented.

What could go wrong?

Of course nothing is ever quite that simple. TML rendering is intimately interleaved with variable expansion; there are macro rendering loops and micro rendering loops which knot and spiral around each other in the core. I don't want to trivialise or underestimate the amount of work involved in moving from TML as the principal store form to HTML. Whoever takes it on is facing some considerable challenges.

One obvious problem is that some plugins authors generate TML. Plugins would have to be a lot more conscious of the structure of the topics they are modifying, and would have to generate structured output. Removing support for TML would have a considerable impact on such plugins. It would be interesting to see which commonly used plugins would be impacted by this; my suspicion is that there are only a few, and they are mostly related to repairing the holes in rendering left by TML (e.g. TablePlugin). Another important point for plugins authors is that HTML is hard to write, unless you are immersed in it. TML is a lot simpler to generate.

The main problem, however, would be compatibility with existing content. While plain topics could be run through WysiwygPlugin to convert the content to HTML, TWiki applications present a tougher challenge. Many TWiki applications generate TML, sometimes in quite obscure ways. There would be no escaping from this.

Steps on the road

Here's what I think would need to be done to make this a reality:
  • Eliminate core reliance on TML
  • Eliminate TML from skins
  • Tag in TWiki meta to say "this topic is HTML, don't render it". The same tag would disable WysiwygPlugin when WYSIWYG editing.
  • Analyse use of TML in key plugins, and port them to use HTML instead.

Nice to have:

  • Use WysiwygPlugin to convert stored HTML to TML for editing in the plain-text edit box.

-- Contributors: CrawfordCurrie - 05 Apr 2008

Discussion

Hm, can someone tell me why I don't like the thought of being 'restricted' to HTML? If you change the underlying topic content format, why not change to something more DITA like? Well, I will miss plain text editing anyway. For me that's part of the Wikiness of Wikis. When this is gone, you may as well use Office. wink

-- FranzJosefGigler - 05 Apr 2008

Crawford, you are spot on with your analysis. But I really think that to move into your direction requires a totally new "wiki" redesigned from scratch, with more focused goals. Making TWiki evolve will require an enourmous amount of work and thousand of heart-broking decisions on what compatibilities to break. One example of a from-scratch by an ex-TWiki user can be seen at SweetWiki of MichelBuffa, (who also uses a lot Jotspot). Also, like Franz, I think that TWiki now (with smartedit or the natskin editor) works so well for the personal / geek teams cases that I want it to stay stable so I can build content on it. The corporate case is the one that would benefit from a from-scratch rewrite. May be we need to make a clean evolutionary step, like perl6, python3000, ... and start from scratch - with a new name - while maintaining the current TWiki satisfying its current "market"

OTOH I may be totally wrong, one can see nice things such as the QuerySearch that quietly remove TML. dependencies step by step. I think that the main pieces lacking from the puzzle to go this way would be (a part the obvious as a structured backend storage and a WYISYG editor) a real server-side scripting language (server-side javascript would be cool, so one would have to know only one language for both server and client sides, but a widely used one such as lua could be a valid choice), and a template language with an efficient cache built-in and integrated with the choosen server-side script... but then maybe it is actually restarting it from scratch anyways...

-- ColasNahaboo - 05 Apr 2008

Even as I agree that a complete rewrite may allow us to do some amazing stuff.. we don't have the people to do it in a reasonable time frame.

Anyway, if I understood well, what Crawford is proposing by "eliminate the core reliance on TML" is not something that is impossible to do, and even can be implemented in a backward-compatible mode.

MovableType has a feature I really like, which is that you can choose your input format for each post. And its something quite simple: it doesn't even translate between dialects.

We can do something similar in TWiki. What I think needs to be done to "eliminate core reliance on TML" is:

  • The mechanism to process Templates and TWiki variables should be left as is
  • Extract the TML rendering code from its current place into a separate module (TWiki::Renderer::TML?)
  • Create some other renderers, to test the concept (TWiki::Renderer::MediaWiki, TWiki::Renderer::HTML?)
  • In the Edit template, allow the user to select from the available list of formats (changing a format may even trigger the use of Plugins.WysiwygPlugin as needed). TML is the default.
  • On save, store in TOPICINFO the format of the topic.
  • Upon view, in the rendering function, use the renderer that corresponds with the topic format (use TML if no format was specified, for backward compatibility)
  • Make sure that all the core TWiki Variables emit either more variables or HTML.
For convenience, we should add to the published API a function that renders a text in whatever format is passed as parameter.

Then there is the issue that getRenderedVersion is called a zillion times. If we put the rule that templates cannot use any specific syntax, only valid HTML/XLM and TWiki variables, then getRenderedVersion will be called a lot less.

Finally, there is the issue that INCLUDEd topics should be rendered in their formats. I don't know how difficult this will be, actually (have been ages since I last saw that code)

One of the side effects of this approach, is that is up to the skin designer if all the formats or only some of them will be available for edit.

Later on, if we want to remove the HTML dependency we can think about having an "OutputRenderer" component or something like that... but one step at the time.

-- RafaelAlvarez - 05 Apr 2008

While I think that getting away with the typographic oriented TML should be rather easy, there's more problems in editing, storing and processing tables, sections of text or lists. While variables are integrated into the parser using handlers, tables, sections and everything line-oriented isn't at all. Just take a look at the code if TablePlugin, EditTablePlugin, EditRowPlugin or SpreadSheetPlugin, but also RenderListPlugin and TreeviewPlugin: they all need chunk of TML as its input. Each of these plugins implements its own redundant topic text parser and each of these island TML parsers is called multiple times for one request. That's because TML is thought to be a one-way thing only, generating HTML. Using it as a storage format does not allow to have enough semantics to ease the situation for all of the named plugins. Nor has HTML, IMHO. That is: HTML is a bad storage format as well. It may be a near hit to the format used to display content in browsers, but it is not suitable to carry enough semantics for the storage format of a CMS.

-- MichaelDaum - 05 Apr 2008

Colas, the goal here is to explore where TWiki can be taken. Implementing a new wiki might be easier, but it wouldn't be TWiki.

Rafael, pluggable syntaxes? Interesting idea. Plugging in different syntaxes takes you back towards the problem with TML, that the languages are not rich enough to represent each other, so transformation between them and the core representation will inevitably be lossy. People would end up being restricted to the intersection of all the supported formats used in their wiki, wouldn't they?

Michael, yes, but I suspect most of these plugins use their parsers to build internal analogues of the DOM. If there was a central parser that was able to present a pre-parsed DOM to them, arguably you could rationalise them all, though it would be a considerable effort.

The real problem for compatibility is the way that variable expansion and TML rendering are interleaved, so that a plugin can defer the expansion of some TML by clever encodings that the author knows won't be expanded until later - knots and spirals.

Obviously the storage format would not be HTML. JotSpot stores topics in HTML, but that HTML is stored in the context of an XML document, and may itself contain embedded XML, such as scripts. HTML does not have TWiki variables. Of course, TWiki variables are themselves freeform, unstructured, and can be used to generate new structured content, which strongly suggests that storing topics that contain TWiki variables in a structured DB is not possible.

I wonder if there might be some middle ground, where by flipping a switch in a topic you could choose to have WysiwygPlugin convert as much as possible back to TML (the current default mode) or you could tell it to leave content in HTML? That way you can choose to make the compromise between richness of the topic (WYSIWYG gives you the full power of HTML that isn't available with TML) versus the availability of plugins that process tables and lists in the topic. You could even narrow that down such that you could electively turn rendering on and off within a topic e.g. %TML{off}%..%TML{on}% (hmmmm, doesn't the <literal> tag do something close to that already?)

-- CrawfordCurrie - 06 Apr 2008

One could argue that perhaps 'replace TML' is throwing out the baby just because the bathwater has slightly cooled.

Most of the experienced TML authors separate out content topics from application topics, and can be extended to make 2 types of topics, or even, 2 types of webs.

Parts of our trouble with Wysiwyg is because people seem to expect to mix things like complex settings with prose - a somewhat literate, mash that really only makes sense to application developers that are into literate programming.

So.. if we separate concerns into APP (ie, TML) and Wysiwyg: ie XHTML fragments? with class and ids used to identify content to the APPs, and then the template selection mechanism for APP selection, are we resolving some of the perceived issues in an evolutionary fashion?

(Note, I'm not saying this is the answer, nor that I understand the question, but I do wonder)

-- SvenDowideit - 06 Apr 2008

Crawford, what I described has nothing to do with converting between these syntaxes: it makes no sense and may actually be impossible. Having pluggable syntaxes has three advantages:

  • It's easier to grok that TWiki Without TML.
  • Makes migration from other Wikies easier.
  • Forces plugins developers not to rely on the TWiki renderer for their output, so they should output proper HTML.

If all the syntaxes, or just one of them, will be available to users is an implementation detail (I'm ok either way). The topic would be stored in the syntax choosen by the original author, and rendered as HTML.

The key barrier is the fact that TML and Variables expansion are interwined, because I bet there are apps out there that rely on this fact to complete some incomplete TML before its all rendered. If we untangle the rendering process, these apps will break. But then, we may break a lot of apps if

-- RafaelAlvarez - 06 Apr 2008

For sure we cannot leave the TML format and taking it out of the core is in my view not feasible.

We - the old existing users - have 10000s or 100000s of topics in TML and TWiki Applications that rely on the format.

And TML itself is not a bad format. Many users still prefer to edit in TML. HTML is so complex that normal people cannot edit the source. Once leaving having a simple wiki markup you are in Wysiwyg land only.

And one of the reasons TWiki Applications are so easy to make is the simple format of TML. Without the ease making applications all you have left is MS Word.

The TML makes it easy both to let applications find information and generate information because the markup is so simple.

I know it is difficult to make Wysiwyg working with TML to HTML translation and I have been one of the strongest advocates to get a good Wysiwyg editor. But I do not want to give up TML. We have a wiki function at work with is pure HTML with Wysiwyg editing in the browser. But no application engine, no wiki words. The result is a wiki very few uses because it cannot do more than you can do in a Word document uploaded on a file server.

It is important that we do not change TWiki from being a wiki to being a type writer application, a word processing program like Word, or a even CMS. Always remember that it is that makes TWiki unique and standing out from the rest.

Making TWiki Applications with searches that returns tables and forcing people to use HTML will be the end of TWiki as we know it. It will become too geeky. It will require intense knowledge about HTML and it will in reality become pure programming to make applications.

The middle ground solution Crawford is talking about may have some good sides and worth exploring. But we should not give up TML. The simple markup is a strength in many ways.

To me it has never been important that the Wysiwyg editor could do very advanced things.

Doing headers, bullets, the most common formatting like bold/italic, inserting images, and above all a decent table handling was the important features we needed.

We have now all those features. Problem is that they are buggy and that the TMCE displays features that do not work in TML. And then we have the I18N issues.

Getting the TMCE working stable and more bug free, limiting the available features to what is really needed, and getting UTF8 working is really what is missing.

I always become afraid when I see debates where some of you guys express the need to forget the past and start all over. We HAVE TO maintain compatibility. We have to protect the huge investment TWiki users have put in to the content of their TWiki. What ever is proposed must take this into account always. Otherwise TWiki is not a trustworthy product for the future.

-- KennethLavrsen - 06 Apr 2008

There is nothing before your comment suggesting a need to "forget the past and start all over". That's not implied anywhere in this discussion (except in Colas' suggestion of starting again from scratch).

Please remember (and this can be hard to understand) that the store format does not need to be the same as the editing format. This is amply demonstrated by Rafael's points, above. As long as you can make editing using TML available to users, and you meet the criteria for compatibility of existing TWiki databases, then you can store topics in whatever format works; there is no implication that you would have to force users to develop "intense knowledge about HTML".

As an illustration of this point, consider the following edit cycle:

  1. User views topic, hits edit button#
  2. System retrieves "decorated HTML" (HTML with TWiki variables) from disc
  3. WysiwygPlugin runs HTML2TML on content
  4. Content - now plain text with embedded TWiki variables - is displayed in a TEXTAREA for editing
  5. User saves. System runs TML2HTML on the content to regenerate the stored form
  6. Resulting decorated HTML is store in the DB.
As Rafael pointed out, you can replace HTML2TML and TML2HTML with HTML2MediaWiki and MediaWiki2HTML to allow editing in different dialects.

As Micha says above, the important thing is to make sure that the semantics of the stored form are rich enough to represent all the editing dialects, including HTML/WYSIWYG. This is the key point of this discussion. Right now the stored form - TML - does not fulfill this criterium, and we are discussing how we might address this problem. One proposed solution is to extend the existing TML stored form - using %TML{off}% or equivalent - to support embedding sections with richer semantics - or, as Sven puts it, "separate complex settings from prose".

The technical detail is how we address the fact that rendering and variable expansion are intertwined, and deal with the fact that some plugins may be written to depend on this intertwining.

Sven, the "2 types of web" idea is an interesting one, especially in the light of applications we have seen in clients where the automation is concentrated in subwebs (or subsets of topics). It begs the question whether the "switch" idea needs to work at the fine granularity of subsections of topics, or whether it would suffice to make it apply at the whole topic/web level. I can imagine:

  • Set ALLOWTOPICTML = on
  • Set ALLOWWEBTML = on
settings controlling whether WYSIWYG attempts conversion back to TML or not (c.f. permissions).

-- CrawfordCurrie - 07 Apr 2008

Problem is not as much the plugins using TML as it is the many many TWiki Applications out there that rely on the data stored in TML (SEARCH).

Making constant HTML2TML / TML2HTML is our major problem today. Why would that suddenly work?

I do not believe it is feasible to store topics in some obscure format and convert back and forth between TML and this format.

If we cannot make it work today with TMCE/Wysiwyg Plugin what makes people think we can make it work in a much more complicated application?

I think the users are going to see garbage half the time when editing raw TML and that topics are going to be changed uncontrolled during an edit raw/save cycle with such an implementation.

This topic started off discussing TWiki WITHOUT TML based on the trouble we know from our Wysiwyg implementation. And a few days later it has evolved into even more translations.

No - I prefer that the topics are stored in the same raw format that you edit raw. Then I would rather see a dual mode feature where the user can choose if the topic is stored in TML or in HTML (with embedded TML allowed enclosed in some form of TML on/off tags). But converting when editing raw - that I think is a near impossible task. It will never work well.

-- KennethLavrsen - 08 Apr 2008

I know these TWikiApplications all too good that parse bits out of a topic text using pattern(). None of these applications feel rock solid. Nor are they maintainable. The larger TWikiApplications become the less you use this kind of applications. It is simply too expensive to do these kind of applications on a large scale. Think of your customers once from that angle. Nor does these applications scale in the sense of computation.

No - I prefer that the topics are stored in a format that (a) information can be extracted easily in a well defined way and (b) that supports any markup you want to use to write them (html, tml, whatever).

Means: there is no innate necessity that the way content is produced and how it is stored must inevitably be the same. Au contrair, storing it in a semantically rich way will forster not hinder TWikiApplications as well as WYSIWYG and WikiMarkup.

-- MichaelDaum - 08 Apr 2008

What are the implications if TWiki has its own WYSIWYG stemming from TML? That is, creating TWiki's own 'TinyMCE' that streamlines with TML.

-- KwangErnLiew - 08 Apr 2008

The problem isn't the editor; WysiwygPlugin already does a pretty good job converting TML into the kind of DOM an editor requires, so you can use an HTML editor. The problem is TML; it just isn't rich enough to act as a base format for storing modern content.

-- CrawfordCurrie - 08 Apr 2008

By looking at it, an interesting solution (I have one week free early may, I hope I will be able to implement something as a proof of concept on this - but feel free to beat me to it) would be to use a code editor in javascript, and make a TML mode for it. This will not solve the WYSIWYG case, but could properly replace smartedit/natedit and allow us to focus more on the wysiwyg now that the geek-mode is taken care of :-). basically, TML is code, so it needs a code editor smile Please look at http://en.wikipedia.org/wiki/Comparison_of_Javascript-based_source_code_editors

codemirror, helene, and 9ne seem promising but young.

-- ColasNahaboo - 08 Apr 2008

The markItUp! Universal Markup Editor looks great as well. Very flexible.

-- MichaelDaum - 08 Apr 2008

As follow up on the storage format. Please understand that our TWiki applications that seek and display information in a way that depends on the content being TML work fine today. Those applications work! They add value. They are great and so dammed simple. Do not try to tell us existing customers that they do not and can be ignored or discarded. Many of our applications work by searching a small defined list of topics and displaying the information. Example is our hotline application that search for information in our weekly reports. This application only search in the active projects and only need to search maybe 5 topics. Change format of our .txt files and nothing will work anymore. Please have respect for the value that lies in our content. I say it again and again. The real accumulated value to us corporate users lies in the content and not in TWiki . TWiki has existed for so many years now and produced so much data that you cannot just ignore this and propose changing the foundation we have built this information on.

The problem with performance is when applications have to search all or most of the files in a web and the number of topics grow and grow. Typically applications where each topic is like a record in a database. Our bugs web is a good example. Or the Codev applications where the search has to look through ALL files searching for a form name in meta. These applications are the ones where searching TML flat files is hopeless. This is where we desperately need an indexed DB type storage.

Another issue with current regex searching is the geek level you need to be at to make these applications. For sure we need the storage format which is structured.

The existing-customer-aware way to proceed to 5.0 is to keep the good TML format in our .txt files and add an additional parallel storage format which TWiki uses for query type searches, access rights lookup etc etc. This parallel storage format can be any format. We have free hands. It can be xml. It can be an Oracle database. It can be anything. The purpose of this parallel storage should be fast and easy access to structured and indexed data.

Most old TWikis are not used for one application but 100s of small applications. Some have 100 or 200 webs and in each maybe 5-10 applications. And the application logic is distributed over 1000s of topics. You cannot just discard all this. And we do not have to discard this.

Maybe you consultants see most TWikis as a single application product because this is often what you are asked to implement for new customers. But this is not where TWiki is used to its full potential. It is in the larger companies where many departments are using a TWiki for all sorts of things that not even I could imagine. I still discover new small applications growing up here and there. There is no way to go back and rewrite all the existing content.

And the current storage format is actually quite good. It is readable and writable by humans (unlike HTML or XML or binary files). It is easy to keep an audit trail of the changes. It is easy to hack. A fact many of us take advantage of with small side scripts that hack the topics or even create topics. You can repair things when things go wrong. I did this just 2 days ago when some TWiki bug goofed up a topic. Our TML/Meta storage is a good old well working storage format. There is no need to discard it. Problem is that it is hopeless when it comes to indexing text, structural data, access rights etc. This is where the parallel storage comes to play.

So the best approach is to simply keep the .txt files as we know them, and implement a parallel storage - with the .txt files as the master. Ie. you can delete the parallel storage and rebuild it from the .txt files (how long that takes is not that critical).

All the .txt storage is used for in practical is editing the Raw, audit trail, and regex searching. Any other process can happen from the parallel storage which for sure will be some sort of DB type storage.

I see no problem extending the .txt files in a compatible way and enable having sections that are pure HTML instead of TML as Crawford suggested. That is feasible and could enable very enhanced Wysiwyg editing for users that need this more than the ability to edit the raw content also.

-- KennethLavrsen - 08 Apr 2008

Kenneth, what would you propose to get away with island parsing?

Have a look at

just to name some, the list is much much longer. You as a customer are in love with those features most probably. Still TWiki is too slow for you and me.

One of the main reasons is that extracting information from the topic format isn't well defined. Each of these plugins does some sort of topic text analysis in a redundant way.

Just take tables: a lot of the above plugins need to access data stored in a TWikiTable. All of the plugins strive to grab the data on its own. Do you think this situation is all fine? The problem is buried in the way TWiki stores its data, not offering any central services to access it. TML is a fine input language but not rich enough and too ambiguous as storage format.

-- MichaelDaum - 09 Apr 2008

Interesting discussion and great ideas, but will anything of it ever get implemented (and by whom)? Maybe in TWikiNG or was it TWikiXP? wink

-- FranzJosefGigler - 09 Apr 2008

Michael.

Yes, exactly!

The idea is that when you store a topic, the topic is stored in the usual raw format and in parallel in the new smarter format based on some sort of topic object model.

First thing to address for sure are tables. It is simple to imagine how to store TML based tables in any type of database because a database is in reality just a table.

Each time TWiki expands its topic object model we expand the Func API so that the plugins can for example read and write tables or edit and save sections of a topic.

My point is that we do not have to discard the .txt format to create the new format and by taking one step at a time there is a chance that it actually cana get implemented. I wrote in another Codev topic that a good starting point would be to create the initial storage format so each time you save a topic the new storage handles

  • Topic is parsed for access rights settings inside both meta and topic and the rights are stored in a set of DB tables that handly only access rights.
  • Form data is saved in simple tables where the tables are built from the form definitions.
  • Variable settings are stored in some DB format
  • Tables are stored as tables in the DB format
  • Entire topic text is stored in a format that is indexed for fast simple nonregex searching.

Accomodated with the right SEARCH syntax for searching data in tables, an API to access and write tables for the plugins, and naturally with the existing API and core code to take advantage of the DB based access rights, and indexes - we should get a significant performance gain on already from 5.0.

I also think it is important that we do not try to do too much at the same time because then we will never get it done.

-- KennethLavrsen - 09 Apr 2008

I though that the main point of this topic was to "get rid of the core dependency on TML" (for whatever reason).

If we agree that the semantic of TML is not expressive enough, and that it's not a very good storage format just because of it, then getting rid of the "dependency" on TML (which is not the same of getting rid of TML) opens the door for choices: The users can choose whatever syntax/format he wants for its content.

The real culprit, the stone that is blocking our path of changing the storage, is SEARCH: Its behavior is too coupled with the current storage format & mechanism. And its already too late to change that. That's stuff for another topic.

-- RafaelAlvarez - 09 Apr 2008


CHECKPOINT

As I said at the start, this isn't a feature request, it's a brainstorming. Whether it gets implemented or not depends on the reader; if someone is excited by the ideas discussed here, they will pick up the banner and pursue them.

Notes on Compatibility and Scalability

I believe that the bottom line is that for any piece of code to be able to call itself 'TWiki' it has to fulfil certain criteria:

  1. It has to be able to read and write existing TWiki databases without loss of content, and without compromising use of those databases with older TWiki versions.
  2. It has to support %SEARCH. That means it has to be able to search the database as if it was stored in TML (even if it isn't).
  3. TML plain-text editing has to be available for those who were brought up on TML.
  4. TML has to be available via the Plugins handler interface for those plugins that want to process it.
Unfortunately these are usually seen as 'kill points' when it comes to making structured TWiki applications scale.

While I have seen scalability issues with unstructured applications, they tend to be addressable using external tech such as google appliances or Lucene. As such they don't impinge on the TWiki core. As most applications scale up they become more and more structured. Some lightweight structured applications, such as BugsContrib, can go on for quite a while using just the basic features of %SEARCH, but inevitably they reach a point where they can scale no further. The application has to move to cacheing technology such as DBCacheContrib, but because of the nature of %SEARCH that step is currently really unclean (it requires you to recode all your searches). Applications that outgrow DBCacheContrib currently have no choice but to migrate to databases or other non-TWiki tech.

What we are looking for is a clean upgrade path that allows a structured TWiki application to scale from a few topics up to hundreds of thousands without having to be recoded several times. The underlying tech that supports the application can change, but the application itself cannot. There are two approaches to this:

  1. Code really clever tech that is able to scale stupid text-search type structured applications by finding and optimising structured elements
  2. Provide support in the basic application language such that application developers can capture structured applications that can then scale.
We have taken approach (2), as can be seen from the introduction of the query language into basic %SEARCH. You can still %SEARCH{"META:..., but if you do, please don't expect your application to scale.

However scalability is not the subject of this topic. There is no doubt in my mind that solutions exist to address it. Right now, the challenge for scalability-hunters is to find a developer motivated to actually do it.

The Real Subject

What I was interested to find out was whether there is a "low hanging fruit" for TWiki; viz use of rich HTML (HTML + TWiki variables) as the store form. Such a move would not IMHO require a vast amount of recoding, it would not block work on accelerators (cache, structured store etc), it would leverage existing plugins as much as possible, it would unleash WYSIWYG, it would enable other applications that want to work on the topic DOM, and perhaps most importantly, it would be an evolutionary development. The main negatives found so far are:

  1. many plugins rely on island parsers to interpret TML.
  2. "Old-style" TWiki applications rely on %SEARCH returning TML.
At this checkpoint:
  • Colas thinks the low-hanging fruit is over-ripe and not worth picking.
  • Rafael is interested in mapping Apples to Oranges, but sees the potential to use HTML as the baseline for both.
  • Sven also sees the potential, and thinks that a "user-steered" option might work.
  • Kenneth wants to be sure Apples always remain Apples, worms or no worms.
  • Michael wants a more aggressively structured store, and the low-hanging fruit probably isn't juicy enough for him.
My gut tells me that Colas is right. The implementation compromises are too deeply embedded, and too public, for there to be any significant gain from moving to HTML. Might as well throw the whole store away and start again with a structured store; in which case, you are probably not working on TWiki any more.

IMHO the inevitable conclusion is that TWiki needs to focus on making TML as painless to use as possible; even if that means enhancing it until it is rich as XHTML+CSS.

-- CrawfordCurrie - 10 Apr 2008

Well.. my point was not to map Apples to Oranges, but to allow Apples and Oranges to coexist in the same bag, and to use the appropriate peeler for each one to eat it.

Anyway, the irony of the situation is that trying to make TML as rich as XHTML+CSS also means that TML must be as "complex" as XHTML+CSS, and TML was supposed to be a simple markup language.

To summarize, am I right to say that the two critical issues resulting from this topic are:

  • There is no other option than trying to make TML as rich as XHTML+CSS
  • Existing SEARCH rely on TML as the storage format

-- RafaelAlvarez - 10 Apr 2008

There is no other option.... that's the direction that TablePlugin, RenderListPlugin etc have taken (%TABLE etc). I'm not saying it has to go all the way, just that if you turn your back on an existing rich representation, you are inevitably going to end up inventing a new one. Existing SEARCH rely... yes, that's a critical constraint. As is the use of island parsers in plugins.

-- CrawfordCurrie - 10 Apr 2008

Crawford: So we're in violent agreement smile

I still think that the steps listed should be taken.

  • Removing the core dependency on TML is a good thing (again, this does not mean that TML will be removed from TWiki), as it reduces the coupling between the syntax and the core.
  • Changing plugins and skins to use HTML instead of TML will only help to improve performance (it's a lot faster to just emit HTML than to emit the rendered version of TML). We should deprecate the use of TML in skins at least (as per the normal deprecation procedure): Two years should be more than enough for people to update their skins and plugins.
  • A meta tag indicating that a topic has 100% HTML content will allow those users that only use WYSIWYG to create topics that will render faster (no TML processing), while allowing advanced users to create complex TWikiApplications with TML. Notice that the storage format is the same regarding META tags, only the content won't contain any TML. This means that current content will still be valid, and new content can be generated in "pure HTML" as desired.

-- RafaelAlvarez - 10 Apr 2008

Good brainstorming discussions here, and good that we agree that this is brainstorming. Although I see the benefits of using HTML instead of TML as the native format consider this: Depending on the HTML generator / editor used we get code that like these few examples for the same visual result:

  • <img src="%ATTACHURL%/mail.gif" width="20" height="10" align="right" alt="" /> mail
  • <img src='%ATTACHURL%/mail.gif' width='20' height='10' align='right' alt='' /> mail
  • <img width="20" align="right" alt="" src="%ATTACHURL%/mail.gif" height="10" /> mail
  • <IMG width=20 ALIGN=right ALT= SRC=%ATTACHURL%/mail.gif HEIGHT=10> mail
  • <IMG width="20" align="right" alt="" src="%ATTACHURL%/mail.gif" height="10"> mail
  • <IMG width="20" align="right" alt="" src="%ATTACHURL%/mail.gif" height="10"> mail</IMG>
  • and 20 other ways...

These are real examples on twiki.org (seen frequently when I clean up the Sandbox.WebHome page.) With so many variations it is impossible to do good version control and content analysis/manipulation; last but not least, TML > HTML > TML roundrtrips.

I think it is not a good idea to consider HTML as a data store the way current HTML editors/generators produce code. If we change to a different format it should be XML with a well defined DTD.

I am with Kenneth, there are a gazillion pages out there and cannot move away from TML without compatibility in mind. I think that the current TML is relatively well defined (others might disagree) and that with proper caching (such as into db tables and HTML) and pre-rendering (the JSP / ASP way) we can address the performance issues.

If we add pluggable syntax to TWiki we raise the complexity considerably and we get into DLL hell issues ("my pages I imported from our partner company's TWiki does not render properly", "the pages I restored from backup for the 5 year audit don't work", ...)

-- PeterThoeny - 10 Apr 2008

If this is brainstorming, you have just violated the rules. Please, unusual ideas are welcome.

-- ArthurClemens - 10 Apr 2008

"Kenneth wants to be sure Apples always remain Apples, worms or no worms."

Not sure I understand where you want to go with that analogy??? Never mind!

The original proposal that topics can have sections of HTML and sections of TML I supported from the beginning. The arguments about maintaining .txt storage and having a parallel storage were to address the other proposals of replacing the TML based .txt files by something completely different like e.g. some xml based format. Changing the .txt format to something else will be the same terminating the TWiki project and start a new and different project.

I do not see the point in trying to enhance TML too much. TML works only because it is simple. If people need more advanced formatting then you will not be editing text files but using some more advanced tool and then why invent a new markup? Then good old HTML will do just fine and TMCE can produce this HTML well already.

A backwards compatible way to extend the .txt format would be to assume TML as the default and allow sections to be pure HTML + TWikiVariables. This section can be the entire topic if people desire this.

Plugins like EditTablePlugin and TablePlugin are not really relevant in a HTML only section.

I think it is possible to enhance the .txt format without breaking compatibility.

And we will still be able to implement the parallel DB storage format for access rights and forms as this is totally independent on whether the topic content is TML or HTML.

-- KennethLavrsen - 10 Apr 2008

The point about diff only applies if you use a plain text diff on the source, and when diffing HTML you don't; you normalise it first. It's easy (and fast) to normalise HTML syntax by running it through HTML::Parser and a simple generator. if you store normalised HTML, you don't have this problem in the first place.

Note also that XHTML is XML with a well defined DTD.

My conclusion that Colas is right stems from the nature of TWiki variables; you can't store structured content when you allow them, so why bother trying to store structured content? Consider:

   * Set OB = <
   * Set CB = >
Oh%OB%P%CB%No!</P%CB%
This is a trivial example of a fundamental problem with the idea of a structured store for content; you can't tell what the structure is until all TWiki variables have been expanded. You can also see that it's impossible to render this stored data as anything other than HTML. The meta-syntax of the output has been predetermined by what is stored.

JotSpot solve this by requiring their equivalent of TWiki variables to be well-formed XML. You cannot inject arbitrary syntax into a JotSpot topic - which is a major strength for structured applications, but a major weakness for learning and flexibility.

I do not see the point in trying to enhance TML too much - the problem with this is well illustrated by tables. HTML has incredibly rich support for tables. At the moment, WYSIWYG has no choice but to store complex tables as HTML, as TML lacks support for anything but the simplest tables.

TWiki could take two approaches in TWiki to address this problem:

  1. The plugin approach - enhance TablePlugin and friends until they can support all the formatting,
  2. The MediaWiki approach - enhance the markup language until it provides much the same support for table formatting as HTML does.
By deferring to TablePlugin we have implicitly selected (1), which creates a problem for HTML2TML. It's (relatively) easy to map from %TABLE tags to HTML, but the reverse mapping is a nightmare. This is because HTML is rich, and if all this richness is used to define a table, the resulting HTML has many different ways to define the same thing, all of which have to be mapped back to a %TABLE parameter. A simple illustration:
<td class="redBackground">...</td><td style="background-color:red">...</td><td bgcolor="red">...</td>
The result of this is that HTML2TML basically ends up stripping out all the careful formatting someone does in WYSIWYG - or, more importantly, their existing HTML when HTML2TML is used to import content from another source. This makes TWiki look really bad, especially when WYSIWYG is used exclusively in TWiki; users just can't see why it can't retain their formatting. So when a complex table is imported, we skip translation to TML and keep it as HTML. This drift to HTML begs the question "why not store the whole topic in HTML", which is where we started. Why burden oursleves with TML when it is only useable for a small fraction of the content?

On the flip side of the coin, the simple structure of TWiki tables has enabled powerful plugins such as EditTablePlugin and SpreadSheetPlugin. It would be nice to be able to use these plugins on tables with complex formatting. AFAIK there are only three ways to do this:

  1. Extend the plugins so they parse HTML (the island parser approach)
  2. Modify the plugins so they no longer know TML but work off a DOM (the central service approach Micha advocates above)
  3. Extend TML and the Plugins to support complex formatting (the MediaWiki approach)
Tables are really just the tip of the iceberg; there are many other niggly areas where current TML just doesn't cut the mustard. Hence the call to extend TML.

-- CrawfordCurrie - 11 Apr 2008

Now don't choke on your coffee or tea but a question from the outside of core dev view. TWiki should evolve, ok, but could it be possible to reverse the problem and say that a new TWiki version works without TML in core, and if you need rendering of old syntax you must install the RenderTMLContrib or something? Then existing customers and new can use good ol' TML if they wish.

Or a compatability switch in configure leveraging new versions of core modules. Other products have features like that; you can switch mode one time and then you cannot switch back. I have no idea about how much labour it would take or how hard technically to achieve, just a simple thought.

-- LarsEik - 11 Apr 2008

No risk of that, Lars, as it's a reasonable idea. It's a long step further than I was thinking of going, as it would require extensive re-architecting of the TWiki core, but there's no reason that shouldn't be done if someone were committed enough.

-- CrawfordCurrie - 12 Apr 2008

Edit | Attach | Watch | Print version | History: r33 < r32 < r31 < r30 < r29 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r33 - 2008-04-12 - CrawfordCurrie
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.