Tags:
create new tag
, view all tags

A Spell Checker for TWiki

I whipped up a spell checker add-on for TWiki this weekend. I needed one because I've started using TWiki to communicate with my co-workers and I can't spell (as you'll likely discover reading this page). I read up on SpellCheckerPlugin and SpellCheckerPluginDev but it seemed that all of the ideas would require Javascript. I didn't want to raise my browser requirements because that's one of the principles of TWiki that make it great. The other ideas I read about all had to do with sending the contents of the edit box to a spell checking window then passing them back in.

Approach

My approach creates two new scripts modeled on edit and preview called spellcheck and spellcheck-preview. spellcheck works like this:
  1. Escape things that don't need checking (like hyperlinks, e-mail addresses, etc.) using a regular expression which wraps the terms in a tag like this <SKIPCHECK badtext >.
  2. Split the text into checkable words ( [a-zA-Z\']+ ) and everything else. Anything enclosed in <>'s is stored without checking. When a checkable word is split out it is checked. If the spell checker has no problems with it, it is wrapped as if it weren't checked
  3. All contiguous un-checked and properly spelled words are merged (to reduce form processing overhead) and converted into HTML consisting of a hidden input tag with its contents (HTML escaped and with newlines turned to <TWSCBRK> tags to hack around CGI.pm's handling of leading newlines) followed by the text (so the user can see it). Misspelled words are wrapped in a pair of controls borrowed from PeterThoeny's textbox / drop-down box idea on SpellCheckerPluginDev. All of the tags (hidden & visible) are given names that end in an index number to keep them in sequence during reconstruction.
  4. When the user presses preview the preview script collects the text in like this:
  5. Look for a noc (hidden) input and if it's found replace any <TWSCBRK> with newlines and add its value to the text
  6. else get the sug (drop-down) input and see if it's (manual). If so get the usr (text-box) value and add that to the text
  7. else add the sug value to the text

That's it! Look ma! No Javascript!

Requirements

These scripts require the following CPAN modules:
  • HTML::Entities
  • Text::Ispell

A Note to RedHat Users: I installed this on my RedHat 7.2 box and found that there was no RH 7.2 package for Ispell. Don't be fooled by the ispell compatibility script provided by aspell. You need the real deal. I had to compile it from source.

A Note to Mandrake Users: I developed this on my Mandrake 9.0 box and found that the Text::Ispell CPAN module pointed to /usr/local/bin/ispell by default. You have to edit the Ispell.pm file and change this to /usr/bin/ispell.

Package

The spellcheck package contains the following:
  • spellcheck & spellcheck preview scripts
    These need to be added to your .htacces file and configured like edit and preview to prevent unauthorized access.
  • a slightly modified view script which only adds the spell check link to the toolbar
  • a modified version of Form.pm which adds a function called renderForHidden() which produces the form as hidden tags so that they can be carried in the spellcheck script and handled by spellcheck-preview. The version I wrote on Saturday didn't do this and it was erasing form contents.
  • spellcheck and spellcheck-preview templates, based on the edit and preview templates.

I hope this helps. I'll be testing it over the next few weeks. Please post any questions, comments, gripes, requests, insults, and offers of promotional merchandise to this page or to dboitnot@yahooPLEASENOSPAM.com. If you'd like to see the plugin in action you can go to http://www.lclinux.org which is one of the sites I'm testing it on.

Please use the Test web if you're just fiddling with the spell checker. That way I don't get updates every time someone does a test.

-- DanBoitnott - 23 Dec 2002

-- PeterThoeny - 23 Dec 2002

This looks very useful! Please have a look at the recent InternationalisationEnhancements work in TWikiAlphaRelease, and particularly the setupRegexes routine in CVS:lib/TWiki.pm. This avoids use of [A-Z] etc in regular expressions, to make it possible to support international character sets for accented characters, Cyrillic characters, and so on. Even if you don't support spell checking of languages other than English today, using the right regexes now would make it much easier to do so later.

There are quite a few regexes already defined in that routine, which should cover what you need, but you can of course write patches to TWiki.pm for new regexes if needed.

-- RichardDonkin - 23 Dec 2002

This is so far behind the curve, it's positively dangerous to install it! You should certainly not risk installing on any release from 2004 or later.

-- CrawfordCurrie - 12 Aug 2005

Topic attachments
I Attachment History Action Size Date Who Comment
Compressed Zip archivezip SpellCheckAddOn.zip r1 manage 16.6 K 2003-05-11 - 00:07 DanBoitnott SpellCheckAddOn package
Compressed Zip archivezip SpellCheckGnuSkinTemplates.zip r1 manage 2.9 K 2003-05-11 - 00:05 DanBoitnott GnuSkin templates for SpellCheckAddOn
Edit | Attach | Watch | Print version | History: r13 < r12 < r11 < r10 < r9 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r13 - 2008-10-10 - MartinCleaver
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.