In last night's TWiki release meeting I was asked for some detail on some of the design decisions I made during the WYSIWYG development for TWiki. Rather than answer them piecemeal on IRC, I decided to write this topic. Ask any detailed questions in replies, and I'll try and answer them. You never know, we might uncover a better approach!
Modern browsers all have some way of editing HTML in the browser. That means there are some functions for handling mouse movements over a rendered DOM, and operations for manipulating the DOM itself. The most widely adopted API is Midas, formalised by Mozilla but rooted in IE. Midas is fairly well supported in IE and Mozilla, and partially in newer Safari versions, but the API spec is vague and open to liberal interpretations. It is also incomplete, with some fairly important manipulation functions totally missing.
How TWiki Wysiwyg works
The TWiki Wysiwyg integrations are all based on a simple principle; convert TWiki topics to HTML, edit the HTML using one of these Midas editors, and convert the edited HTML back to TML for saving. The WysiwygPlugin does these two translations, and the TinyMCEPlugin and KupuContrib are thin wrappers around these editors that work with the WysiwygPlugin. The vast bulk of my work on Wysiwyg has been focused on the translators.
Getting from TML to HTML
The main challenge faced when converting from TML to HTML is to do it in such a way that the original TML is recoverable. The normal "rendering pipeline" in TWiki is what is called a "lossy conversion". It is lossy because important information about the topic structure is thrown away during the conversion. For example, %I% is converted to an image tag, with no record of the fact it was originally a TWiki variable.
This means the standard rendering pipline can't be used, and a bespoke convertor had to be developed to support the conversion without information loss. The convertor works by recognising TML constructs that it knows it can handle, and converting them to HTML. Constructs it can't handle, such as TWiki variables, are simply protected as CDATA in HTML spans, and are edited as plain text in the editor.
Embedded HTML represents an additional challenge, as ideally you want to keep this HTML safe throughout the edit and ensure it reappears it in the TML. This isn't as easy as it sounds, for a number of reasons.
HTML tags hand-edited into the TML may be ill-formed, and the editors explode if fed them. You cannot parse the HTML tags to make sure they are well formed, because you risk changing the intent of the original tags.
The editors cannot easily be made to respect such tags (treat them as uneditable). Flagging them using classes works for some editors, some of the time, but is totally unreliable. As a result the editors can easily munge the tags beyond repair e.g. by removing / adding classes, even changing the tag type. Users don't like it when their carefully crafted HTML tags are "eaten" this way.
For these reasons - and others - I made the decision to treat hand-entered HTML tags as sacrosanct, and protect them in the same way as TWiki variables are protected.
Getting from HTML to TML
Getting from HTML back to TML is in some ways simpler than the opposite transform. HTML coming from editors is well structured, and in theory all you need is an HTML parser and a syntax generator. Sadly, this is only the tip of the iceberg.
Unlike HTML, TML is a line-oriented language, and has some curious and rather subtle rules about line endings, that present a particular problem when generating line breaks. So the chosen approach is to generate the TML text in a buffer using special "hint" characters that are post-processed to generate the final topic layout. For example, one of the hints is "there needs to be whitespace here". This character is collapsed if the final output has a whitespace in the right place, and converted to a space if not.
The real nightmares in this step though are bullet lists and tables. The designers of TML took the decision - had the luxury - of supporting a subset of the formatting options available in HTML. For example, TWiki tables are restricted to a single level (no tables in tables). This causes horrendous problems in the syntax generation, because it has to be context sensitive, and know when to give up. The translator also has to know when it has to generate a linebreak (inside verbatim or pre), and when can't generate a linebreak (inside tables and lists). In general it does a pretty good job when dealing with existing TWiki topics, but when pasting in HTML from other sources you will often see HTML in the topic, where the translator just had to give up trying to generate TML, or risk losing important formatting. Because we want to continue to use TML, we have to be prepared to throw quite a bit of imported formatting away - for example, unrecognised classes on spans.
Why did Wysiwyg recently get so much better?
Anyone who uses the Wysiwyg in the currently checked-in code will have noticed how much more reliable it is. It is much less likely to create issues in your topic text. The reason is quite simple; a sponsor, ScottBlack, stepped up, and enabled me to spend a few days working on the guts of the translator. During that time I simplified the code significantly, removing much of the "TWiki specific" handling, such as interpretation of some TWiki variables. The simplification of the code this way lead to a significant improvement in reliability. KISS!
Another factor is the shift to TinyMCE as the editor of choice. This little editor is less powerful than Kupu, but is also smaller and significantly easier to integrate. It also has much better table editing facilities.
Server versus client
Having said that, I have implemented the TML to HTML translation in JS as a demonstrator, and the performance is very good.
Change the store model
As noted above, storing topics in HTML makes life a heck of a lot easier, as they can be edited without a translation step. An alternative approach, originally proposed by MichaelSparks, is to store the topic in HTML (or a similar neutral format) and convert to TML only for the purposes of editing in a plain text editor. This approach has some obvious advantages, but has many detractors so has never been fully explored.
I hope I have given you some insight into the problems faced doing WYSIWYG in TWiki. It's a really interesting area, but absorbs a huge amount of effort - I estimate I have spent over 200 hours working on the current solution. So please ask any questions below, and I'll do my best to respond, and if you feel you are in a position to help sponsor further improvements, get in touch.
EmanueleCupido - 29 Aug 2007:
On the subject of tml->HTML translation being lossy:
Having the following tml code
%I% Look at this!
and applying tml->HTML translation to it, we get the following HTML code:
The TWiki variable originally contained in the tml is not recoverable from the HTML, hence the original tml is lost.
How about tagging the HTML in such a way it carries info about TWiki variables? For instance, the HTML after translation could look something like:
In this case the original tml should be recoverable.
But I am sure there are 100s of other reasons why this approach cannot work (in addition to the effort for implementing it)
(By the way, I've first tried with a tag like
but TWiki would still expand the
. Hence the use of %^. I guess no attention is paid to
. Or perhaps it's an intentional behavior?)
CedricWeber - 03 Sep 2007:
Thanks for that Update on the WYSIWYG Developments! This feature is the one most requested by corporate-customers.
And - its good to have a Blog now!