create new tag
, view all tags

Html Filtering Plugin Development/Brainstoming

This page is to brainstorm some ideas on creating a plugin to filter good html into topics (e.g. <p>, <form>) and stop bad html (e.g. <javascript>) from being displayed when viewing topics.

For this plugin to work on display, it must be called before any other plugins process the page. Otherwise it could filter out a plugin's html rather than user added html.

However, if this filter can be applied to the topic when it is saved, it would cut down on processing, and permit particular users to pass "bad" html through the filter. I think the beforeSaveHandler hook in the current alpha allows this.

I envision the HtmlFilterPlugin page having preferences that allows users or groups to use certain tags in the pages they save: E.G.

    • Set AllowJavaScript = TWikiAdminGroup

would mean that anybody in the TWikiAdminGroup would be allowed to save pages including javascript. However the javascript tags would be removed (or sanitized) if somebody outside of the TWikiAdminGroup saved the page.

Using INCLUDE, "bad" html pages can be created and the functionality made available in pages editable by ordinary users.

I originally thought that the filter should pass bad html only if the edited page had an ALLOWTOPICCHANGE that restricted permissions to the allowed group. However I don't think that's needed. If unauthorized users edit the page, the "bad" html can be recovered from RCS, or the bad html tag can just be mangled (javascript->scriptjava) into an innocuous form, waiting for the next authorized user to reverse the change.

  • One problem is that if an unauthorized user edits the page, the script that he adds could then hijack the authentication of the authorized user; so that when the authorized user views the page the script could, for example, do an HTTP POST / edit-save to a page that the original unauthorized user would not have been able to edit directly. -- DaleBrayden - 11 Mar 2003

So the questions are:

  • What are safe html tags
  • What are bad html tags
  • Can the input be filtered correctly so that only allowed tags are passed through?

-- JohnRouillard - 31 Dec 2002

You have to do more than just filter tags - you need to look at event handlers on tags (onclick, onmouseover, etc.) These can occur on just about any tag. Fortunately, the event handler content must either include a script specifier (like 'javascript:do evil stuff here') or make a call to code defined elsewhere within <script> tags or linked in with a LINK tag.

-- DaleBrayden - 11 Mar 2003

These may be of some interest:



...HTML filtering is a complicated problem, and you need to consider what are safe and bad attributes as well as what are safe and bad tags.

-- NickCleaton - 11 Mar 2003

Nick's HTMLFilter sounds very useful - since it's whitelist based it sounds like it should be quite safe. I'd like to see if it can be used when saving a page, to avoid the performance overhead of running it on every page view.

-- RichardDonkin - 15 Mar 2003

Not only that, but filtering the raw text as it's saved avoids the problem that a rendering-time plugin would have: the filtering plugin would have to run before all other plugins, to avoid undoing what the other plugins (and twiki's own rendering code) has done.

At the risk of re-opening a discussion that may have been covered at DisableHTML or SanitisingHTML, it seems to me that a worthy goal would be

  1. Make enough TWiki syntax to make embedding of html unnecessary, and
  2. Provide a twiki configuration option that outright disables html input

The 2nd part of the goal would allow us to add the strip-html-during-save directly into the TWiki core. I think that the first part of the goal is mostly achieved already - the only thing I sometimes miss is the ability to express an href that opens in another window (i.e. <a href="foo" target="win2">). I'm sure there are other constructs that can't be expressed, but surely these are all things that could be done with plugins and extended syntax ???

-- DaleBrayden - 15 Mar 2003

DaleBrayden said:

it seems to me that a worthy goal would be

   1 Make enough TWiki syntax to make embedding of html unnecessary, and
   2 Provide a twiki configuration option that outright disables html input
I claim that we will not and should not totally eliminate html/javascript.

The nice thing about TWiki is that 80% of work can be done without having to know HTML. This makes it easier for people to use. But advanced HTML items like forms, and javascript (which raises the requirments bar but may be suitable for some intranets) also make things easier to use when you are following a process, E.G. the bug creation pages.

Now why spend time re-inventing syntax for forms, or javascript when it will be used less than 20% of the time. I suspect that only TWiki developers and advanced users will probably use the html since it's features are usually required for adding process rather than information to pages.

Now with this being said, should we provide safeguards against malicious html/javascript ..., certainly, but why reinvent HTML for the last 20% or less of operations?

-- JohnRouillard - 17 Mar 2003

OK - fair enough. I don't use forms much on either of my TWiki sites, so I tend to forget how useful they are. Still, it seems to me that form definition is not something that a TWiki user needs. I see this as akin to the definition of templates - we put our templates in a directory where they cannot be updated by twiki users.

It's not quite right that only TWiki developers and advanced users will use html - unless you include hackers and hacker wannabees as advanced users. This proposal about eliminating html has been made before by other people (e.g. GetRidOfJavaScript), and has met with vehement opposition before. Maybe there are two types of people involved in this discussion: those who have been hacked and those who haven't.

Anyway - if the HtmlFilterPlugin provides the AllowJavaScript preference variable, as defined at the top of this topic, then my concerns are fully addressed.

Just one other note: someone at c2 recently said something to the effect that "hacking wiki is about as intellectually challenging as hacking a corkboard - so why bother?" And it does seem to be true that wiki sites are hacked less often and less destructively than, say, phpWebSite or phpNuke sites. The sad corollary to this is that any effort to hack-proof twiki had better be genuinely hack-proof, or it will become a challenging target for the defacement sub-culture.

-- DaleBrayden - 17 Mar 2003

Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r10 - 2003-03-17 - DaleBrayden
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.