WYSIWYG nitty gritty

2007-08-27 - 22:42:26 by CrawfordCurrie in Development

In last night's TWiki release meeting I was asked for some detail on some of the design decisions I made during the WYSIWYG development for TWiki. Rather than answer them piecemeal on IRC, I decided to write this topic. Ask any detailed questions in replies, and I'll try and answer them. You never know, we might uncover a better approach!

How "Javascript editors" work

Modern browsers all have some way of editing HTML in the browser. That means there are some functions for handling mouse movements over a rendered DOM, and operations for manipulating the DOM itself. The most widely adopted API is Midas, formalised by Mozilla but rooted in IE. Midas is fairly well supported in IE and Mozilla, and partially in newer Safari versions, but the API spec is vague and open to liberal interpretations. It is also incomplete, with some fairly important manipulation functions totally missing.

Despite this there are a number of "Javascript editors", which are actually just thin layers over this DOM manipulation technology. These are widely used in CMSes that store their topics in raw HTML, and less widely in wikis.

How TWiki Wysiwyg works

The TWiki Wysiwyg integrations are all based on a simple principle; convert TWiki topics to HTML, edit the HTML using one of these Midas editors, and convert the edited HTML back to TML for saving. The WysiwygPlugin does these two translations, and the TinyMCEPlugin and KupuContrib are thin wrappers around these editors that work with the WysiwygPlugin. The vast bulk of my work on Wysiwyg has been focused on the translators.

Getting from TML to HTML

The main challenge faced when converting from TML to HTML is to do it in such a way that the original TML is recoverable. The normal "rendering pipeline" in TWiki is what is called a "lossy conversion". It is lossy because important information about the topic structure is thrown away during the conversion. For example, %I% is converted to an image tag, with no record of the fact it was originally a TWiki variable.

This means the standard rendering pipline can't be used, and a bespoke convertor had to be developed to support the conversion without information loss. The convertor works by recognising TML constructs that it knows it can handle, and converting them to HTML. Constructs it can't handle, such as TWiki variables, are simply protected as CDATA in HTML spans, and are edited as plain text in the editor.

Embedded HTML represents an additional challenge, as ideally you want to keep this HTML safe throughout the edit and ensure it reappears it in the TML. This isn't as easy as it sounds, for a number of reasons.

HTML tags hand-edited into the TML may be ill-formed, and the editors explode if fed them. You cannot parse the HTML tags to make sure they are well formed, because you risk changing the intent of the original tags.
The editors cannot easily be made to respect such tags (treat them as uneditable). Flagging them using classes works for some editors, some of the time, but is totally unreliable. As a result the editors can easily munge the tags beyond repair e.g. by removing / adding classes, even changing the tag type. Users don't like it when their carefully crafted HTML tags are "eaten" this way.

For these reasons - and others - I made the decision to treat hand-entered HTML tags as sacrosanct, and protect them in the same way as TWiki variables are protected.

Javascript editors are improving at a rapid pace, and we can expect to have to continually adapt to the latest and greatest. The strategy I used in the translator is to do the minimum necessary to provide the editor with robust, clean HTML, and most importantly, keep it editor independent.

On that note, PeterThoeny asked why not tag the html generated in the translator to protect it when pasting in new content? A good question. The answer is that it doesn't really buy you anything. The Javascript editor is perfectly capable of pre-processing the HTML (for example on load) to tag HTML as protected, and then post-processing to perform special actions on tags that have been pasted from external sources, without involving the translatorps. Basically, it's the editors job to get pasting right, not the translator, and to make sure this happens we need to move our attention upstream.

Getting from HTML to TML

Getting from HTML back to TML is in some ways simpler than the opposite transform. HTML coming from editors is well structured, and in theory all you need is an HTML parser and a syntax generator. Sadly, this is only the tip of the iceberg.

Unlike HTML, TML is a line-oriented language, and has some curious and rather subtle rules about line endings, that present a particular problem when generating line breaks. So the chosen approach is to generate the TML text in a buffer using special "hint" characters that are post-processed to generate the final topic layout. For example, one of the hints is "there needs to be whitespace here". This character is collapsed if the final output has a whitespace in the right place, and converted to a space if not.

The real nightmares in this step though are bullet lists and tables. The designers of TML took the decision - had the luxury - of supporting a subset of the formatting options available in HTML. For example, TWiki tables are restricted to a single level (no tables in tables). This causes horrendous problems in the syntax generation, because it has to be context sensitive, and know when to give up. The translator also has to know when it has to generate a linebreak (inside verbatim or pre), and when can't generate a linebreak (inside tables and lists). In general it does a pretty good job when dealing with existing TWiki topics, but when pasting in HTML from other sources you will often see HTML in the topic, where the translator just had to give up trying to generate TML, or risk losing important formatting. Because we want to continue to use TML, we have to be prepared to throw quite a bit of imported formatting away - for example, unrecognised classes on spans.

Why did Wysiwyg recently get so much better?

Anyone who uses the Wysiwyg in the currently checked-in code will have noticed how much more reliable it is. It is much less likely to create issues in your topic text. The reason is quite simple; a sponsor, ScottBlack, stepped up, and enabled me to spend a few days working on the guts of the translator. During that time I simplified the code significantly, removing much of the "TWiki specific" handling, such as interpretation of some TWiki variables. The simplification of the code this way lead to a significant improvement in reliability. KISS!

Another factor is the shift to TinyMCE as the editor of choice. This little editor is less powerful than Kupu, but is also smaller and significantly easier to integrate. It also has much better table editing facilities.

Other approaches

Server versus client

The WysiwygPugin is a server-side translator solution. As such it impposes a load on the server, and can be hard to integrate into the right places in TWiki. So why not simply write the translator in Javascript, and do it client-side? Well, for two reasons. First, the translator code is significant; loading it into the browser takes time and bandwidth. Second, and most importantly, every browser manufacturer has implemented the DOM differently, and making client-side Javascript portable between browsers is a nightmare. Thirdly, by performing the translation server-side we are able to leverage all the pre-existing support in TWiki that otherwise needs to be duplicated client side.

Having said that, I have implemented the TML to HTML translation in JS as a demonstrator, and the performance is very good.

Change the store model

As noted above, storing topics in HTML makes life a heck of a lot easier, as they can be edited without a translation step. An alternative approach, originally proposed by MichaelSparks, is to store the topic in HTML (or a similar neutral format) and convert to TML only for the purposes of editing in a plain text editor. This approach has some obvious advantages, but has many detractors so has never been fully explored.

Conclusions

I hope I have given you some insight into the problems faced doing WYSIWYG in TWiki. It's a really interesting area, but absorbs a huge amount of effort - I estimate I have spent over 200 hours working on the current solution. So please ask any questions below, and I'll do my best to respond, and if you feel you are in a position to help sponsor further improvements, get in touch.

Comments

EmanueleCupido - 29 Aug 2007:

On the subject of tml->HTML translation being lossy:

Having the following tml code

%I% Look at this!

and applying tml->HTML translation to it, we get the following HTML code:

<img src="/wiki412/pub/TWiki/TWikiDocGraphics/tip.gif" alt="IDEA!" title="IDEA!" width="16" height="16" border="0" /> Look at this!

The TWiki variable originally contained in the tml is not recoverable from the HTML, hence the original tml is lost.

How about tagging the HTML in such a way it carries info about TWiki variables? For instance, the HTML after translation could look something like:

<img src="/wiki412/pub/TWiki/TWikiDocGraphics/tip.gif" alt="IDEA!" title="IDEA!" width="16" height="16" border="0" /><!--twiki_origin=%^I%^--> Look at this!

In this case the original tml should be recoverable.
But I am sure there are 100s of other reasons why this approach cannot work (in addition to the effort for implementing it) frown

(By the way, I've first tried with a tag like

<!--twiki_origin=%I%-->

but TWiki would still expand the

%I%

. Hence the use of %^. I guess no attention is paid to

<!--   -->

. Or perhaps it's an intentional behavior?)

CedricWeber - 03 Sep 2007:

Thanks for that Update on the WYSIWYG Developments! This feature is the one most requested by corporate-customers. And - its good to have a Blog now!

Topic revision: r3 - 2007-08-28 - CrawfordCurrie

Account
- Log In
- Register User

Edit
Attach

Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.