MS Word to TWiki Markup Language Add-On
When a migration to TWiki is planned, but many documents already exist in MS Word .doc format, the migration process can be a pain to do manually. Also casual contributers like to use their favorite text editor, which is in many cases MS Word.
Several plugins have attempted to overcome this problem, the most well-known is
TWiki:Plugins.MsOfficeAttachmentsAsHTMLPlugin. However, in this way the documents are not editable in the wiki fashion, i.e. 'click-edit-save'.
This simple VBA script can convert a .doc to
TWiki:Codev.TWikiML. It's far from complete, but handles the basics.
Usage
- User edits Word file
- User saves Word file for future reference
- User clicks
Alt+F8 or Tools-Macros > Word2TWiki
- Text in Word file is converted to TWikiML, and is copied to the clipboard
- Document is saved as filtered htm (additional document besides your previously saved word document)
- Inline images are collected in folder ActiveDocumentPath\YourFileName_files
- Explorer opens the images folder
- User pastes data into TWiki and saves topic
Ctrl + V or Edit -> Paste
- User uploads all images from the folder. (Recommended: use TWiki:Plugins.BatchUploadPlugin to simply upload all the images as a zip file that gets unzipped by the plugin on the topic)
Features
Currently, it handles conversions for the following:
- Headings(1-6)
- Italics
- Underline
- Typewriter font (Courier New)
- Bold (also combined with Italics, Underline, Typewriter font)
- Text color (see "Colored text" in TWikiPreferences#Rendering_Shortcuts)
- (Nested) Bullet lists
- (Nested) Numbered lists (No fancy styles or continuations)
- %BR% is added before linebreaks (paragraph breaks are not touched)
- Regular Tables, with or without merged cells, rowspans and colspans
- Partially handles inline images
All other objects or features are left untouched.
WIBNIFs
It would be really nice if somebody could try to improve this Add-On to:
| Feature | Status |
| Replace all shapes (images, word art, Word equations, etc) with links to %ATTACHURL%/image1.gif. The numbering should be consistant with the generated image numbering after saving the .doc as 'HTML, filtered' in MS Word. | Partially implemented. Images are collected, document is saved as htm. Links are modified. Resized images, word art, and other untested objects are not yet supported |
| Actually save all shapes from wihtin the macro, so the numbering is known and no guesses need to be made | Completed in htm files folder |
| Detect if a TOC is present in the .doc and replace it with %TOC% | Not yet implemented |
| fix the Known Problems | Some fixed |
Known Problems
- Has been known to cause Word to hang altogether (at least partly fixed in version 1.1, please provide feedback and test cases (word docs) if you still notice this)
- If (lines in) a Table cell is bold or italic, but doesn't actually contain any text, this macro still inserts
* *
or _ _
.
- Formatted pictures. For some reason, pictures that have been resized inside word are not being recognized as pictures and therefore are not saved in the htm image folder.
- Pictures inside table cells. They are not kept inside a table cell.
- Numeric bullet lists inside Table cells. All the numbers get reset to "1"
- WordArt? textboxes get copied, but appear randomly in the text.
- It won't keep right formatted images with text on the left.
- Single paragraph breaks still exist in the converted TWiki source but disappear when TWiki renders the topic. Double (or more) paragraph breaks create a paragraph break in TWiki. A linebreak in word [Shift-Enter] get an additional %BR% during conversion.
Add-On Installation Instructions
Note: Contrary to many TWiki Add Ons this is not installed on the server but is a macro to be installed in MS Word.
- Download the .BAS file from the Add-on Home (see below)
- If you have problems downloading the .BAS file, try the .ZIP version.
- Launch Microsoft Word, go to
Tools | Macro | Visual Basic Editor (Alt+F11)
- right mouse button on the Normal project (within the Project Explorer window - if you don't see that window, go
View | Project Explorer (Ctrl-R), do an Insert | Module
- Use
context menu or File menu to select Import file... and pick the downloaded .BAS file.
- go
File | Save Normal, then File | Close and Return to Microsoft Word
Please note that from version 1.400, this macro requires MS Excel in order to handle merged table cells. Since most people/corporations who own Word also own Excel, this decision was believed to be acceptable. If you do not own MS Excel but do want to use this macro, replace the ConvertTable() subfunction with the function in version 1.310 (the Add On will then fail on encountering tables with merged cells).
Add-On Info
- Set SHORTDESCRIPTION = Visual Basic script to convert a Microsoft Word documents to the TWiki markup language
| Add-on Author: | TWiki:Main/JosMaccabiani, TWiki:Main/MerlijnVanDeen, TWiki:Main/MikaelOlenfalk, TWiki:Main/PabloCaskey, TWiki:Main/TouseefLiaqat, TWiki:Main/DougClaar, TWiki:Main/MiloValenzuela, TWiki:Main/AlexanderStedile |
| Add-on Version: | 25 Jul 2007 (v1.481) |
| Change History: | |
| 06 Aug 2007: | v1.482: TWiki:Main/AlexanderStedile Bug fixes, uppercase and spacing issues. |
| 26 Jul 2007: | v1.481: TWiki:Main/AlexanderStedile fixed converting hyper links by changing conversion step execution order. |
| 25 Jul 2007: | v1.480: TWiki:Main/AlexanderStedile added converting text color (named Word/TWiki colors), added converting linebreaks (paragraphs are not touched). |
| 23 Jul 2007: | v1.470: TWiki:Main/AlexanderStedile added conversion for underline, typewriter font, combinations with bold. Refactored and removed some copy&paste code. |
| 26 Apr 2007: | v1.460: TWiki:Main/DougClaar merged the bugfixes of 1.4.4 back in. They got lost in 1.4.5 |
| 25 Feb 2007: | v1.450: Added support for inline images by saving the document as htm which collects the images in a folder and fiurther modifies the links to them |
| 26 May 2006: | v1.440: Removed the bug in which macro hangs while converting the links and also the order of text and address is corrected. (thanks Touseef) |
| 06 Apr 2006: | v1.430: handle nested lists even better (plus small bugfix) (thanks Pablo) |
| 19 Sep 2005: | v1.410: More robust and elegant handling of merged cells (thanks Merlijn) |
| 18 Sep 2005: | v1.400: Tables with merged cells are supported (thanks Merlijn) |
| 22 Aug 2005: | v1.310: Small bugfix, removed double variable declaration in sub ConvertLists. |
| 22 Aug 2005: | v1.300: Correct conversion of nested bullet- and numbered lists. (thanks Mikael) |
| 06 Aug 2005: | v1.200: Better conversion of bold and italic formatting by correct handling of trailing and leading formatted spaces. |
| 05 Aug 2005: | v1.100: Fixes bug where Word hangs if formatting (bold/italics) is applied to the paragraph mark at the end of a line that is contained in a bullet-list. |
| 08 Jul 2005: | v1.000: Initial version |
| CPAN Dependencies: | none |
| Other Dependencies: | Requires MS Word and MS Excel (http://www.microsoft.com) |
| Perl Version: | n/a |
| License: | GPL |
| Add-on Home: | http://TWiki.org/cgi-bin/view/Plugins/MsWordToTWikiMLAddOn |
| Feedback: | http://TWiki.org/cgi-bin/view/Plugins/MsWordToTWikiMLAddOnDev |
| Appraisal: | http://TWiki.org/cgi-bin/view/Plugins/MsWordToTWikiMLAddOnAppraisal |
Acknowledgments
Version 1.1 of this Add On was more than heavily based on / directly copied from:
Version 1.4.5 handling of inline images is based on a lightly modified code from
Related Topic: TWikiAddOns
--
TWiki:Main/JosMaccabiani - 09 Jul 2005