Brainstorming: TWiki without TML
A lot more people - especially new adopters - are using TWiki with
WYSIWYG. Such users are not learning
TML (TWiki Markup Language), except when they come to make new TWiki applications, when they are faced with a huge learning curve. TML was a great idea at the time - it allowed users to generate complex markup with minimum learning and very simple tools - but this change to more WYSIWYG in the user landscape means we should reconsider whether it still has such an important role.
One of the main pieces of work I have done over the last couple of years has been the development of the
WysiwygPlugin. This plugin provides mappings from TML to
HTML and
HTML to TML, so I'm in a pretty good position to comment on the limitations of TML. More recently I have been working with a number of former
JotSpot customers to port their topics from Jot to TWiki, and this has further highlighted this problem.
Please note that this is
not a feature proposal. It is a discussion topic, intended to make developers (existing and potential) think out of the box.
What is TML Rendering?
Broadly speaking, Peter designed TWiki to separate the generation of
HTML out into three phases; template instantiation, variable expansion, and TML rendering.
- template instantiation - is the process of reading and expanding skin templates, and can be thought of as a "compile time" process
- variable expansion - is the expansion of TWiki variables such as %SEARCH to TML and HTML
- TML rendering - is the process of recognising TML constructs, such as text used to represent bolded text and | delimited tables, and expanding them to HTML.
This discussion relates to the third of these, TML rendering. The thesis of this article is that template instantiation and variable expansion are the "guts" of TWiki, and that TML rendering was a necessary, but now potentially redundant, step.
What are the problems with TML?
TML is pretty powerful, but it has a number of problems that make it really challenging to convert to and from
HTML.
- Tables. TML tables are flat, and the markup you can have in table cells is extremely limited. Many uses of tables require presentation of complex content in tables. Also, the control you have over the presentation of tables is very limited. The TablePlugin ameliorates this somewhat, but is still constrained by the features the designers chose to incorporate; full CSS support is a long way off.
- Lack of structure. For maximum usability, TML is freeform - it doesn't require the topic to be parseable, as TML constructs are expanded using regular expressions applied in a highly context-sensitive environment. This lack of structure leads to (thankfully very few) ambiguities, but it still defeats any possibility of creating a context-free parser for the language. This fundamentally limits the efficiency of the language. It also makes it very difficult to subdivide topics into structural element - for example paragraphs, headings, tables etc - a process which is key for improving the editing experience, but also for addressing topic content externally (something a structured wiki ought to be able to do).
- Store efficiency. As attractive as the flat text files approach is, it's hard to use modern database tools with TML. You have no choice but to store topics flat, which leads to problems searching, and compromises efficiency in the store.
- Extensibility. Plugins authors often want to make small extensions to TML to enable extra functionality - for example, they may want to add new highlighting types, or new block structures that delimit specialised content. It is very difficult to extend TML in such a way that such specialised content doesn't conflict with extensions made by other plugins. It's also difficult to stop the core from imposing it's own interpretation on specialised content.
- Portability of content. Because TML constrains what can be done so heavily, importing content from other wikis is a real PITA. Even MediaWiki offers better table formatting support built into the language. Importing HTML requires running through the WysiwygPlugin, which inevitably strips out carefully designed features of the input HTML. Sometimes the only option is to import HTML as HTML, and accept that the result will be read-only topics
- Performance. A significant chunk of topic view time is spent rendering TML.
- WYSIWYG. Modern browsers are designed with built-in DOM editing tools (MIDAS). It is really, really difficult to leverage these tools when you are restricted to TML.
What is the alternative?
Right now you can write a TWiki application in (almost) pure
HTML. Such topics still have embedded TWiki variables (such as %TOPIC%) but don't use TML - except where it is generated as a result of the expansion of the TWiki variable, such as %SEARCH. A TWiki application written in
HTML has to be written to avoid any TML constructs, which can be tricky, but it
is feasible. This suggests that in many cases, TML can be eliminated from the rendering loop simply by not calling the TML rendering function.
Of course, some TWiki variables generate TML; the most glaring is %SEARCH, which generates its output this way. However it would be fairly straightforward to convert this to generate template-driven
HTML; indeed, the opportunity could be taken to fix this when
ResultSets are implemented.
What could go wrong?
Of course nothing is ever quite that simple. TML rendering is intimately interleaved with variable expansion; there are macro rendering loops and micro rendering loops which knot and spiral around each other in the core. I don't want to trivialise or underestimate the amount of work involved in moving from TML as the principal store form to
HTML. Whoever takes it on is facing some considerable challenges.
One obvious problem is that some plugins authors generate TML. Plugins would have to be a lot more conscious of the structure of the topics they are modifying, and would have to generate structured output. Removing support for TML would have a considerable impact on such plugins. It would be interesting to see which commonly used plugins would be impacted by this; my suspicion is that there are only a few, and they are mostly related to repairing the holes in rendering left by TML (e.g.
TablePlugin). Another important point for plugins authors is that
HTML is hard to write, unless you are immersed in it. TML is a lot simpler to generate.
The main problem, however, would be compatibility with existing content. While plain topics could be run through WysiwygPlugin to convert the content to
HTML, TWiki applications present a tougher challenge. Many TWiki applications generate TML, sometimes in quite obscure ways. There would be no escaping from this.
Steps on the road
Here's what I think would need to be done to make this a reality:
- Eliminate core reliance on TML
- Eliminate TML from skins
- Tag in TWiki meta to say "this topic is HTML, don't render it". The same tag would disable WysiwygPlugin when WYSIWYG editing.
- Analyse use of TML in key plugins, and port them to use HTML instead.
Nice to have:
- Use WysiwygPlugin to convert stored HTML to TML for editing in the plain-text edit box.
--
Contributors: CrawfordCurrie - 05 Apr 2008
Discussion
Hm, can someone tell me why I don't like the thought of being 'restricted' to
HTML? If you change the underlying topic content format, why not change to something more
DITA like? Well, I will miss plain text editing anyway. For me that's part of the Wikiness of Wikis. When this is gone, you may as well use Office.
--
FranzJosefGigler - 05 Apr 2008
Crawford, you are spot on with your analysis. But I really think that to move into your direction requires a totally new "wiki" redesigned from scratch, with more focused goals. Making TWiki evolve will require an enourmous amount of work and thousand of heart-broking decisions on what compatibilities to break. One example of a from-scratch by an ex-TWiki user can be seen at
SweetWiki
of
MichelBuffa, (who also uses a lot Jotspot). Also, like Franz, I think that TWiki now (with smartedit or the natskin editor) works so well for the personal / geek teams cases that I want it to stay stable so I can build content on it. The corporate case is the one that would benefit from a from-scratch rewrite. May be we need to make a clean evolutionary step, like perl6, python3000, ... and start from scratch - with a new name - while maintaining the current TWiki satisfying its current "market"
OTOH I may be totally wrong, one can see nice things such as the
QuerySearch that quietly remove
TML. dependencies step by step. I think that the main pieces lacking from the puzzle to go this way would be (a part the obvious as a structured backend storage and a WYISYG editor) a real server-side scripting language (server-side javascript would be cool, so one would have to know only one language for both server and client sides, but a widely used one such as lua could be a valid choice), and a template language with an efficient cache built-in and integrated with the choosen server-side script... but then maybe it is actually restarting it from scratch anyways...
--
ColasNahaboo - 05 Apr 2008
Even as I agree that a complete rewrite may allow us to do some amazing stuff.. we don't have the people to do it in a reasonable time frame.
Anyway, if I understood well, what Crawford is proposing by "eliminate the core reliance on
TML" is not something that is impossible to do, and even can be implemented in a backward-compatible mode.
MovableType has a feature I really like, which is that you can choose your input format for each post. And its something quite simple: it doesn't even translate between dialects.
We can do something similar in TWiki. What I think needs to be done to "eliminate core reliance on
TML" is:
- The mechanism to process Templates and TWiki variables should be left as is
- Extract the TML rendering code from its current place into a separate module (TWiki::Renderer::TML?)
- Create some other renderers, to test the concept (TWiki::Renderer::MediaWiki, TWiki::Renderer::HTML?)
- In the Edit template, allow the user to select from the available list of formats (changing a format may even trigger the use of Plugins.WysiwygPlugin as needed). TML is the default.
- On save, store in TOPICINFO the format of the topic.
- Upon view, in the rendering function, use the renderer that corresponds with the topic format (use TML if no format was specified, for backward compatibility)
- Make sure that all the core TWiki Variables emit either more variables or HTML.
For convenience, we should add to the published API a function that renders a text in whatever format is passed as parameter.
Then there is the issue that
getRenderedVersion is called a zillion times. If we put the rule that templates cannot use any specific syntax, only valid
HTML/XLM and TWiki variables, then
getRenderedVersion will be called a lot less.
Finally, there is the issue that INCLUDEd topics should be rendered in their formats. I don't know how difficult this will be, actually (have been ages since I last saw that code)
One of the side effects of this approach, is that is up to the skin designer if all the formats or only some of them will be available for edit.
Later on, if we want to remove the
HTML dependency we can think about having an "OutputRenderer" component or something like that... but one step at the time.
--
RafaelAlvarez - 05 Apr 2008
While I think that getting away with the typographic oriented
TML should be rather easy, there's more problems in editing, storing and processing tables, sections of text or lists. While variables are integrated into the parser using handlers, tables, sections and everything line-oriented isn't at all. Just take a look at the code if
TablePlugin,
EditTablePlugin,
EditRowPlugin or
SpreadSheetPlugin, but also
RenderListPlugin and
TreeviewPlugin: they all need chunk of
TML as its input. Each of these plugins implements its own redundant topic text parser and each of these island
TML parsers is called multiple times for one request. That's because
TML is thought to be a one-way thing only, generating
HTML. Using it as a storage format does not allow to have enough semantics to ease the situation for all of the named plugins. Nor has
HTML, IMHO. That is:
HTML is a bad storage format as well. It may be a near hit to the format used to display content in browsers, but it is not suitable to carry enough semantics for the storage format of a CMS.
--
MichaelDaum - 05 Apr 2008
Colas, the goal here is to explore where TWiki can be taken. Implementing a new wiki might be easier, but it wouldn't be TWiki.
Rafael, pluggable syntaxes? Interesting idea. Plugging in different syntaxes takes you back towards the problem with
TML, that the languages are not rich enough to represent each other, so transformation between them and the core representation will inevitably be lossy. People would end up being restricted to the intersection of all the supported formats used in their wiki, wouldn't they?
Michael, yes, but I suspect most of these plugins use their parsers to build internal analogues of the
DOM. If there was a central parser that was able to present a pre-parsed
DOM to them, arguably you could rationalise them all, though it
would be a considerable effort.
The real problem for compatibility is the way that variable expansion and
TML rendering are
interleaved, so that a plugin can defer the expansion of some
TML by clever encodings that the author knows won't be expanded until later - knots and spirals.
Obviously the storage format would not be
HTML.
JotSpot stores topics in
HTML, but that
HTML is stored in the context of an
XML document, and may itself contain embedded
XML, such as scripts.
HTML does not have TWiki variables. Of course, TWiki variables are themselves freeform, unstructured, and can be used to generate new structured content, which strongly suggests that storing topics that contain TWiki variables in a structured DB is not possible.
I wonder if there might be some middle ground, where by flipping a switch in a topic you could choose to have
WysiwygPlugin convert as much as possible back to
TML (the current default mode) or you could tell it to leave content in
HTML? That way you can choose to make the compromise between richness of the topic (
WYSIWYG gives you the full power of
HTML that isn't available with
TML) versus the availability of plugins that process tables and lists in the topic. You could even narrow that down such that you could electively turn rendering on and off within a topic e.g. %TML{off}%..%TML{on}% (hmmmm, doesn't the <literal> tag do something close to that already?)
--
CrawfordCurrie - 06 Apr 2008
One could argue that perhaps 'replace
TML' is throwing out the baby just because the bathwater has slightly cooled.
Most of the experienced
TML authors separate out content topics from application topics, and can be extended to make 2 types of topics, or even, 2 types of webs.
Parts of our trouble with Wysiwyg is because people seem to expect to mix things like complex settings with prose - a somewhat literate, mash that really only makes sense to application developers that are into literate programming.
So.. if we separate concerns into APP (ie,
TML) and Wysiwyg: ie
XHTML fragments? with class and ids used to identify content to the APPs, and then the template selection mechanism for APP selection, are we resolving some of the perceived issues in an evolutionary fashion?
(Note, I'm not saying this is the answer, nor that I understand the question, but I do wonder)
--
SvenDowideit - 06 Apr 2008
Crawford, what I described has nothing to do with converting between these syntaxes: it makes no sense and may actually be impossible. Having pluggable syntaxes has three advantages:
- It's easier to grok that TWiki Without TML.
- Makes migration from other Wikies easier.
- Forces plugins developers not to rely on the TWiki renderer for their output, so they should output proper HTML.
If all the syntaxes, or just one of them, will be available to users is an implementation detail (I'm ok either way). The topic would be stored in the syntax choosen by the original author, and rendered as
HTML.
The key barrier is the fact that
TML and Variables expansion are interwined, because I bet there are apps out there that rely on this fact to complete some incomplete
TML before its all rendered. If we untangle the rendering process, these apps will break.
But then, we may break a lot of apps if
--
RafaelAlvarez - 06 Apr 2008
For sure we cannot leave the
TML format and taking it out of the core is in my view not feasible.
We - the old existing users - have 10000s or 100000s of topics in
TML and TWiki Applications that rely on the format.
And
TML itself is not a bad format. Many users still prefer to edit in
TML.
HTML is so complex that normal people cannot edit the source. Once leaving having a simple wiki markup you are in Wysiwyg land only.
And one of the reasons TWiki Applications are so easy to make is the simple format of
TML. Without the ease making applications all you have left is MS Word.
The
TML makes it easy both to let applications find information and generate information because the markup is so simple.
I know it is difficult to make Wysiwyg working with
TML to
HTML translation and I have been one of the strongest advocates to get a good Wysiwyg editor. But I do not want to give up
TML. We have a wiki function at work with is pure
HTML with Wysiwyg editing in the browser. But no application engine, no wiki words. The result is a wiki very few uses because it cannot do more than you can do in a Word document uploaded on a file server.
It is important that we do not change TWiki from being a wiki to being a type writer application, a word processing program like Word, or a even CMS. Always remember that it is that makes TWiki unique and standing out from the rest.
Making TWiki Applications with searches that returns tables and forcing people to use
HTML will be the end of TWiki as we know it. It will become too geeky. It will require intense knowledge about
HTML and it will in reality become pure programming to make applications.
The middle ground solution Crawford is talking about may have some good sides and worth exploring. But we should not give up
TML. The simple markup is a strength in many ways.
To me it has never been important that the Wysiwyg editor could do very advanced things.
Doing headers, bullets, the most common formatting like bold/italic, inserting images, and above all a decent table handling was the important features we needed.
We have now all those features. Problem is that they are buggy and that the TMCE displays features that do not work in
TML. And then we have the
I18N issues.
Getting the TMCE working stable and more bug free, limiting the available features to what is really needed, and getting UTF8 working is really what is missing.
I always become afraid when I see debates where some of you guys express the need to forget the past and start all over. We HAVE TO maintain compatibility. We have to protect the huge investment TWiki users have put in to the content of their TWiki. What ever is proposed must take this into account always. Otherwise TWiki is not a trustworthy product for the future.
--
KennethLavrsen - 06 Apr 2008
There is nothing before your comment suggesting a need to "forget the past and start all over". That's not implied anywhere in this discussion (except in Colas' suggestion of starting again from scratch).
Please remember (and this can be hard to understand) that the
store format does not need to be the same as the
editing format. This is amply demonstrated by Rafael's points, above. As long as you can make editing using
TML available to users, and you meet the criteria for compatibility of existing TWiki databases, then you can store topics in whatever format works; there is no implication that you would have to force users to develop "intense knowledge about
HTML".
As an illustration of this point, consider the following edit cycle:
- User views topic, hits edit button#
- System retrieves "decorated HTML" (HTML with TWiki variables) from disc
- WysiwygPlugin runs HTML2TML on content
- Content - now plain text with embedded TWiki variables - is displayed in a TEXTAREA for editing
- User saves. System runs TML2HTML on the content to regenerate the stored form
- Resulting decorated HTML is store in the DB.
As Rafael pointed out, you can replace
HTML2TML and
TML2HTML with
HTML2MediaWiki and
MediaWiki2HTML to allow editing in different dialects.
As Micha says above, the important thing is to make sure that the semantics of the stored form are rich enough to represent
all the editing dialects, including
HTML/WYSIWYG. This is the
key point of this discussion. Right now the stored form -
TML - does
not fulfill this criterium, and we are discussing how we might address this problem. One proposed solution is to
extend the existing
TML stored form - using %TML{off}% or equivalent - to support embedding sections with richer semantics - or, as Sven puts it, "separate complex settings from prose".
The
technical detail is how we address the fact that rendering and variable expansion are intertwined, and deal with the fact that some plugins may be written to depend on this intertwining.
Sven, the "2 types of web" idea is an interesting one, especially in the light of applications we have seen in clients where the automation is concentrated in subwebs (or subsets of topics). It begs the question whether the "switch" idea needs to work at the fine granularity of subsections of topics, or whether it would suffice to make it apply at the whole topic/web level. I can imagine:
- Set ALLOWTOPICTML = on
- Set ALLOWWEBTML = on
settings controlling whether
WYSIWYG attempts conversion back to
TML or not (c.f. permissions).
--
CrawfordCurrie - 07 Apr 2008
Problem is not as much the plugins using
TML as it is the many many TWiki Applications out there that rely on the data stored in
TML (SEARCH).
Making constant HTML2TML / TML2HTML is our major problem today. Why would that suddenly work?
I do not believe it is feasible to store topics in some obscure format and convert back and forth between
TML and this format.
If we cannot make it work today with TMCE/Wysiwyg Plugin what makes people think we can make it work in a much more complicated application?
I think the users are going to see garbage half the time when editing raw
TML and that topics are going to be changed uncontrolled during an edit raw/save cycle with such an implementation.
This topic started off discussing TWiki WITHOUT
TML based on the trouble we know from our Wysiwyg implementation. And a few days later it has evolved into even more translations.
No - I prefer that the topics are stored in the same raw format that you edit raw. Then I would rather see a dual mode feature where the user can choose if the topic is stored in
TML or in
HTML (with embedded
TML allowed enclosed in some form of
TML on/off tags). But converting when editing raw - that I think is a near impossible task. It will never work well.
--
KennethLavrsen - 08 Apr 2008
I know these TWikiApplications all too good that parse bits out of a topic text using
pattern(). None of these applications feel rock solid. Nor are they maintainable. The larger TWikiApplications become the less you use this kind of applications. It is simply too expensive to do these kind of applications on a large scale. Think of your customers once from that angle. Nor does these applications scale in the sense of computation.
No - I prefer that the topics are stored in a format that (a) information can be extracted easily in a well defined way and (b) that supports any markup you want to use to write them (html, tml, whatever).
Means: there is no innate necessity that the way content is produced and how it is stored must inevitably be the same. Au contrair, storing it in a semantically rich way will
forster not
hinder TWikiApplications as well as
WYSIWYG and WikiMarkup.
--
MichaelDaum - 08 Apr 2008
What are the implications if TWiki has its own
WYSIWYG stemming from
TML? That is, creating TWiki's own 'TinyMCE' that streamlines with
TML.
--
KwangErnLiew - 08 Apr 2008
The problem isn't the editor; WysiwygPlugin already does a pretty good job converting
TML into the kind of
DOM an editor requires, so you can use an
HTML editor. The problem is
TML; it just isn't rich enough to act as a base format for storing modern content.
--
CrawfordCurrie - 08 Apr 2008
By looking at it, an interesting solution (I have one week free early may, I hope I will be able to implement something as a proof of concept on this - but feel free to beat me to it) would be to use a code editor in javascript, and make a
TML mode for it. This will not solve the
WYSIWYG case, but could properly replace smartedit/natedit and allow us to focus more on the wysiwyg now that the geek-mode is taken care of :-). basically,
TML is code, so it needs a code editor

Please look at
http://en.wikipedia.org/wiki/Comparison_of_Javascript-based_source_code_editors
codemirror, helene, and 9ne seem promising but young.
--
ColasNahaboo - 08 Apr 2008
The
markItUp! Universal Markup Editor
looks great as well. Very flexible.
--
MichaelDaum - 08 Apr 2008
As follow up on the storage format. Please understand that our TWiki applications that seek and display information in a way that depends on the content being
TML work fine today. Those applications work! They add value. They are great and so dammed simple. Do not try to tell us existing customers that they do not and can be ignored or discarded. Many of our applications work by searching a small defined list of topics and displaying the information. Example is our hotline application that search for information in our weekly reports. This application only search in the active projects and only need to search maybe 5 topics. Change format of our .txt files and nothing will work anymore. Please have respect for the value that lies in our content. I say it again and again. The real accumulated value to us corporate users lies in the content and not in TWiki . TWiki has existed for so many years now and produced so much data that you cannot just ignore this and propose changing the foundation we have built this information on.
The problem with performance is when applications have to search all or most of the files in a web and the number of topics grow and grow. Typically applications where each topic is like a record in a database. Our bugs web is a good example. Or the Codev applications where the search has to look through ALL files searching for a form name in meta. These applications are the ones where searching
TML flat files is hopeless. This is where we desperately need an indexed DB type storage.
Another issue with current regex searching is the geek level you need to be at to make these applications. For sure we need the storage format which is structured.
The existing-customer-aware way to proceed to 5.0 is to keep the good
TML format in our .txt files and add an additional parallel storage format which TWiki uses for query type searches, access rights lookup etc etc. This parallel storage format can be any format. We have free hands. It can be xml. It can be an Oracle database. It can be anything. The purpose of this parallel storage should be fast and easy access to structured and indexed data.
Most old TWikis are not used for one application but 100s of small applications. Some have 100 or 200 webs and in each maybe 5-10 applications. And the application logic is distributed over 1000s of topics. You cannot just discard all this. And
we do not have to discard this.
Maybe you consultants see most TWikis as a single application product because this is often what you are asked to implement for new customers. But this is not where TWiki is used to its full potential. It is in the larger companies where many departments are using a TWiki for all sorts of things that not even I could imagine. I still discover new small applications growing up here and there. There is no way to go back and rewrite all the existing content.
And the current storage format is actually quite good. It is readable and writable by humans (unlike
HTML or
XML or binary files). It is easy to keep an audit trail of the changes. It is easy to hack. A fact many of us take advantage of with small side scripts that hack the topics or even create topics. You can repair things when things go wrong. I did this just 2 days ago when some TWiki bug goofed up a topic. Our
TML/Meta storage is a good old well working storage format. There is no need to discard it. Problem is that it is hopeless when it comes to indexing text, structural data, access rights etc. This is where the parallel storage comes to play.
So the best approach is to simply keep the .txt files as we know them, and implement a parallel storage - with the .txt files as the master. Ie. you can delete the parallel storage and rebuild it from the .txt files (how long that takes is not that critical).
All the .txt storage is used for in practical is editing the Raw, audit trail, and regex searching. Any other process can happen from the parallel storage which for sure will be some sort of DB type storage.
I see no problem extending the .txt files in a compatible way and enable having sections that are pure
HTML instead of
TML as Crawford suggested. That is feasible and could enable very enhanced Wysiwyg editing for users that need this more than the ability to edit the raw content also.
--
KennethLavrsen - 08 Apr 2008
Kenneth, what would you propose to
get away with island parsing?
Have a look at
just to name some, the list is much much longer. You as a customer are in love with those features most probably. Still TWiki is too slow for you and me.
One of the main reasons is that extracting information from the topic format isn't well defined. Each of these plugins does some sort of topic text analysis in a redundant way.
Just take tables: a lot of the above plugins need to access data stored in a TWikiTable. All of the plugins strive to grab the data on its own. Do you think this situation is all fine?
The problem is buried in the way TWiki stores its data, not offering any central services to access it. TML is a fine input language but not rich enough and too ambiguous as storage format.
--
MichaelDaum - 09 Apr 2008
Interesting discussion and great ideas, but will anything of it ever get implemented (and by whom)? Maybe in
TWikiNG or was it
TWikiXP?
--
FranzJosefGigler - 09 Apr 2008
Michael.
Yes, exactly!
The idea is that when you store a topic, the topic is stored in the usual raw format
and in parallel in the new smarter format based on some sort of topic object model.
First thing to address for sure are tables. It is simple to imagine how to store
TML based tables in any type of database because a database is in reality just a table.
Each time TWiki expands its topic object model we expand the Func API so that the plugins can for example read and write tables or edit and save sections of a topic.
My point is that we do not have to discard the .txt format to create the new format and by taking one step at a time there is a chance that it actually cana get implemented. I wrote in another Codev topic that a good starting point would be to create the initial storage format so each time you save a topic the new storage handles
- Topic is parsed for access rights settings inside both meta and topic and the rights are stored in a set of DB tables that handly only access rights.
- Form data is saved in simple tables where the tables are built from the form definitions.
- Variable settings are stored in some DB format
- Tables are stored as tables in the DB format
- Entire topic text is stored in a format that is indexed for fast simple nonregex searching.
Accomodated with the right SEARCH syntax for searching data in tables, an API to access and write tables for the plugins, and naturally with the existing API and core code to take advantage of the DB based access rights, and indexes - we should get a significant performance gain on already from 5.0.
I also think it is important that we do not try to do too much at the same time because then we will never get it done.
--
KennethLavrsen - 09 Apr 2008
I though that the main point of this topic was to "get rid of the core dependency on
TML" (for whatever reason).
If we agree that the semantic of
TML is not expressive enough, and that it's not a very good storage format just because of it, then getting rid of the "dependency" on
TML (which is not the same of getting rid of
TML) opens the door for choices: The users can choose whatever syntax/format he wants for its content.
The real culprit, the stone that is blocking our path of changing the storage, is SEARCH: Its behavior is too coupled with the current storage format & mechanism. And its already too late to change that. That's stuff for another topic.
--
RafaelAlvarez - 09 Apr 2008
CHECKPOINT
As I said at the start, this isn't a feature request, it's a brainstorming.
Whether it gets implemented or not depends on the reader; if someone is
excited by the ideas discussed here, they will pick up the banner and
pursue them.
Notes on Compatibility and Scalability
I believe that the bottom line is that for any piece of code to be able
to call itself 'TWiki' it has to fulfil certain criteria:
- It has to be able to read and write existing TWiki databases without loss of content, and without compromising use of those databases with older TWiki versions.
- It has to support %SEARCH. That means it has to be able to search the database as if it was stored in TML (even if it isn't).
- TML plain-text editing has to be available for those who were brought up on TML.
- TML has to be available via the Plugins handler interface for those plugins that want to process it.
Unfortunately these are usually seen as 'kill points' when it comes to
making structured TWiki applications scale.
While I have seen scalability issues with unstructured
applications, they tend to be addressable using external tech such as google
appliances or Lucene. As such they don't impinge on the TWiki core.
As most applications scale up they become more and more structured. Some
lightweight structured applications, such as
BugsContrib, can go on
for quite a while using just the basic features of %SEARCH, but inevitably
they reach a point where they can scale no further. The application has to
move to cacheing technology such as
DBCacheContrib, but because of
the nature of %SEARCH that step is currently really unclean (it requires you to recode all your searches). Applications
that outgrow
DBCacheContrib currently have no choice but to migrate
to databases or other non-TWiki tech.
What we are looking for is a clean upgrade path that allows a
structured TWiki application to scale from a few topics up to hundreds of
thousands without having to be recoded several times. The underlying tech that
supports the application can change, but the application itself cannot. There
are two approaches to this:
- Code really clever tech that is able to scale stupid text-search type structured applications by finding and optimising structured elements
- Provide support in the basic application language such that application developers can capture structured applications that can then scale.
We have taken approach (2), as can be seen from the introduction of the
query language into basic %SEARCH. You can still
%SEARCH{"META:..., but
if you do, please don't expect your application to scale.
However
scalability is not the subject of this topic. There is no doubt in my mind
that solutions exist to address it. Right now, the
challenge for scalability-hunters is to find a developer motivated to actually
do it.
The Real Subject
What I was interested to find out was whether there is a "low hanging fruit" for TWiki;
viz use of
rich HTML (
HTML + TWiki variables) as the store form.
Such a move would not IMHO require a vast amount of recoding, it
would not block work on accelerators (cache, structured store etc), it would
leverage existing plugins as much as possible, it would unleash
WYSIWYG, it would enable
other applications that want to work on the topic
DOM, and
perhaps most importantly, it would be an evolutionary development. The main
negatives found so far are:
- many plugins rely on island parsers to interpret TML.
- "Old-style" TWiki applications rely on %SEARCH returning TML.
At this checkpoint:
- Colas thinks the low-hanging fruit is over-ripe and not worth picking.
- Rafael is interested in mapping Apples to Oranges, but sees the potential to use HTML as the baseline for both.
- Sven also sees the potential, and thinks that a "user-steered" option might work.
- Kenneth wants to be sure Apples always remain Apples, worms or no worms.
- Michael wants a more aggressively structured store, and the low-hanging fruit probably isn't juicy enough for him.
My gut tells me that Colas is right. The implementation compromises are too
deeply embedded, and too public, for there to be any significant gain from
moving to
HTML. Might as well throw the whole store away and start again with
a structured store; in which case, you are probably not working on TWiki
any more.
IMHO the inevitable conclusion is that TWiki needs to focus on making
TML as painless to use as possible; even if that means
enhancing it until it is rich as
XHTML+CSS.
--
CrawfordCurrie - 10 Apr 2008
Well.. my point was not to map Apples to Oranges, but to allow Apples and Oranges to coexist in the same bag, and to use the appropriate peeler for each one to eat it.
Anyway, the irony of the situation is that trying to make
TML as rich as
XHTML+CSS also means that
TML must be as "complex" as
XHTML+CSS, and
TML was supposed to be a simple markup language.
To summarize, am I right to say that the two critical issues resulting from this topic are:
- There is no other option than trying to make TML as rich as XHTML+CSS
- Existing SEARCH rely on TML as the storage format
--
RafaelAlvarez - 10 Apr 2008
There is no other option.... that's the direction that TablePlugin, RenderListPlugin etc have taken (%TABLE etc). I'm not saying it has to go all the way, just that if you turn your back on an existing rich representation, you are inevitably going to end up inventing a new one.
Existing SEARCH rely... yes, that's a critical constraint. As is the use of island parsers in plugins.
--
CrawfordCurrie - 10 Apr 2008
Crawford: So we're in violent agreement
I still think that the steps listed should be taken.
- Removing the core dependency on TML is a good thing (again, this does not mean that TML will be removed from TWiki), as it reduces the coupling between the syntax and the core.
- Changing plugins and skins to use HTML instead of TML will only help to improve performance (it's a lot faster to just emit HTML than to emit the rendered version of TML). We should deprecate the use of TML in skins at least (as per the normal deprecation procedure): Two years should be more than enough for people to update their skins and plugins.
- A meta tag indicating that a topic has 100% HTML content will allow those users that only use WYSIWYG to create topics that will render faster (no TML processing), while allowing advanced users to create complex TWikiApplications with TML. Notice that the storage format is the same regarding META tags, only the content won't contain any TML. This means that current content will still be valid, and new content can be generated in "pure HTML" as desired.
--
RafaelAlvarez - 10 Apr 2008
Good brainstorming discussions here, and good that we agree that this is brainstorming. Although I see the benefits of using
HTML instead of
TML as the native format consider this: Depending on the
HTML generator / editor used we get code that like these few examples for the same visual result:
-
<img src="%ATTACHURL%/mail.gif" width="20" height="10" align="right" alt="" /> mail
-
<img src='%ATTACHURL%/mail.gif' width='20' height='10' align='right' alt='' /> mail
-
<img width="20" align="right" alt="" src="%ATTACHURL%/mail.gif" height="10" /> mail
-
<IMG width=20 ALIGN=right ALT= SRC=%ATTACHURL%/mail.gif HEIGHT=10> mail
-
<IMG width="20" align="right" alt="" src="%ATTACHURL%/mail.gif" height="10"> mail
-
<IMG width="20" align="right" alt="" src="%ATTACHURL%/mail.gif" height="10"> mail</IMG>
- and 20 other ways...
These are real examples on twiki.org (seen frequently when I clean up the
Sandbox.WebHome page.) With so many variations it is impossible to do good version control and content analysis/manipulation; last but not least,
TML >
HTML >
TML roundrtrips.
I think it is not a good idea to consider
HTML as a data store the way current
HTML editors/generators produce code.
If we change to a different format it should be
XML with a well defined DTD.
I am with Kenneth, there are a gazillion pages out there and cannot move away from
TML without compatibility in mind. I think that the current
TML is relatively well defined (others might disagree) and that with proper caching (such as into db tables and
HTML) and pre-rendering (the JSP / ASP way) we can address the performance issues.
If we add pluggable syntax to TWiki we raise the complexity considerably and we get into DLL hell issues ("my pages I imported from our partner company's TWiki does not render properly", "the pages I restored from backup for the 5 year audit don't work", ...)
--
PeterThoeny - 10 Apr 2008
If this is brainstorming, you have just
violated the rules
. Please, unusual ideas are welcome.
--
ArthurClemens - 10 Apr 2008
"Kenneth wants to be sure Apples always remain Apples, worms or no worms."
Not sure I understand where you want to go with that analogy??? Never mind!
The original proposal that topics can have sections of
HTML and sections of
TML I supported from the beginning. The arguments about maintaining .txt storage and having a parallel storage were to address the other proposals of replacing the
TML based .txt files by something completely different like e.g. some xml based format. Changing the .txt format to something else will be the same terminating the TWiki project and start a new and different project.
I do not see the point in trying to enhance
TML too much.
TML works only because it is simple. If people need more advanced formatting then you will not be editing text files but using some more advanced tool and then why invent a new markup? Then good old
HTML will do just fine and TMCE can produce this
HTML well already.
A backwards compatible way to extend the .txt format would be to assume
TML as the default and allow sections to be pure
HTML +
TWikiVariables. This section can be the entire topic if people desire this.
Plugins like EditTablePlugin and TablePlugin are not really relevant in a
HTML only section.
I think it is possible to enhance the .txt format without breaking compatibility.
And we will still be able to implement the parallel DB storage format for access rights and forms as this is totally independent on whether the topic content is
TML or
HTML.
--
KennethLavrsen - 10 Apr 2008
The point about
diff only applies if you use a plain text diff on the source, and when diffing
HTML you don't; you normalise it first. It's easy (and fast) to normalise
HTML syntax by running it through
HTML::Parser and a simple generator. if you store normalised
HTML, you don't have this problem in the first place.
Note also that
XHTML is XML with a well defined DTD.
My conclusion that Colas is right stems from the nature of TWiki variables; you can't store structured content when you allow them, so why bother trying to store structured content? Consider:
* Set OB = <
* Set CB = >
Oh%OB%P%CB%No!</P%CB%
This is a trivial example of a fundamental problem with the idea of a structured store for content; you can't tell what the structure is until all TWiki variables have been expanded. You can also see that it's impossible to render this stored data as anything other than
HTML. The meta-syntax of the output has been predetermined by what is stored.
JotSpot solve this by requiring their equivalent of TWiki variables to be well-formed
XML. You
cannot inject arbitrary syntax into a
JotSpot topic - which is a major strength for structured applications, but a major weakness for learning and flexibility.
I do not see the point in trying to enhance TML too much - the problem with this is well illustrated by tables.
HTML has incredibly rich support for tables. At the moment,
WYSIWYG has no choice but to store complex tables as
HTML, as
TML lacks support for anything but the simplest tables.
TWiki could take two approaches in TWiki to address this problem:
- The plugin approach - enhance TablePlugin and friends until they can support all the formatting,
- The MediaWiki approach - enhance the markup language until it provides much the same support for table formatting as HTML does.
By deferring to TablePlugin we have implicitly selected (1), which creates a problem for
HTML2TML. It's (relatively) easy to map from %TABLE tags to
HTML, but the reverse mapping is a nightmare. This is because
HTML is rich, and if all this richness is used to define a table, the resulting
HTML has many different ways to define the same thing, all of which have to be mapped back to a %TABLE parameter. A simple illustration:
<td class="redBackground">...</td><td style="background-color:red">...</td><td bgcolor="red">...</td>
The result of this is that
HTML2TML basically ends up stripping out all the careful formatting someone does in
WYSIWYG - or, more importantly, their existing
HTML when
HTML2TML is used to import content from another source. This makes TWiki look really bad, especially when
WYSIWYG is used exclusively in TWiki; users just can't see why it can't retain their formatting. So when a complex table is imported, we skip translation to
TML and keep it as
HTML. This drift to
HTML begs the question "why not store the whole topic in
HTML", which is where we started. Why burden oursleves with
TML when it is only useable for a small fraction of the content?
On the flip side of the coin, the simple structure of TWiki tables has enabled powerful plugins such as EditTablePlugin and SpreadSheetPlugin. It would be nice to be able to use these plugins on tables with complex formatting. AFAIK there are only three ways to do this:
- Extend the plugins so they parse HTML (the island parser approach)
- Modify the plugins so they no longer know TML but work off a DOM (the central service approach Micha advocates above)
- Extend TML and the Plugins to support complex formatting (the MediaWiki approach)
Tables are really just the tip of the iceberg; there are many other niggly areas where current
TML just doesn't cut the mustard. Hence the call to extend
TML.
--
CrawfordCurrie - 11 Apr 2008
Now don't choke on your coffee or tea but a question from the outside of core dev view. TWiki should evolve, ok, but could it be possible to reverse the problem and say that a new TWiki version works without
TML in core, and if you need rendering of
old syntax you must install the
RenderTMLContrib or something? Then existing customers and new can use good ol'
TML if they wish.
Or a compatability switch in configure leveraging new versions of core modules. Other products have features like that; you can switch mode one time and then you cannot switch back. I have no idea about how much labour it would take or how hard technically to achieve, just a simple thought.
--
LarsEik - 11 Apr 2008
No risk of that, Lars, as it's a reasonable idea. It's a long step further than I was thinking of going, as it would require extensive re-architecting of the TWiki core, but there's no reason that shouldn't be done if someone were committed enough.
--
CrawfordCurrie - 12 Apr 2008