r30 - 08 Aug 2007 - 15:11:22 - AlexanderStedileYou are here: TWiki >  Plugins Web > MsWordToTWikiMLAddOn
Tags:
import 2 Add my vote for this tag, , create new tag

MS Word to TWiki Markup Language Add-On

When a migration to TWiki is planned, but many documents already exist in MS Word .doc format, the migration process can be a pain to do manually. Also casual contributers like to use their favorite text editor, which is in many cases MS Word. Several plugins have attempted to overcome this problem, the most well-known is TWiki:Plugins.MsOfficeAttachmentsAsHTMLPlugin. However, in this way the documents are not editable in the wiki fashion, i.e. 'click-edit-save'.

This simple VBA script can convert a .doc to TWiki:Codev.TWikiML. It's far from complete, but handles the basics.

Usage

  1. User edits Word file
  2. User saves Word file for future reference
  3. User clicks Alt+F8 or Tools-Macros > Word2TWiki
  4. Text in Word file is converted to TWikiML, and is copied to the clipboard
  5. Document is saved as filtered htm (additional document besides your previously saved word document)
  6. Inline images are collected in folder ActiveDocumentPath\YourFileName_files
  7. Explorer opens the images folder
  8. User pastes data into TWiki and saves topic Ctrl + V or Edit -> Paste
  9. User uploads all images from the folder. (Recommended: use TWiki:Plugins.BatchUploadPlugin to simply upload all the images as a zip file that gets unzipped by the plugin on the topic)

Features

Currently, it handles conversions for the following:
  • Headings(1-6)
  • Italics
  • Underline
  • Typewriter font (Courier New)
  • Bold (also combined with Italics, Underline, Typewriter font)
  • Text color (see "Colored text" in TWikiPreferences#Rendering_Shortcuts)
  • (Nested) Bullet lists
  • (Nested) Numbered lists (No fancy styles or continuations)
  • %BR% is added before linebreaks (paragraph breaks are not touched)
  • Regular Tables, with or without merged cells, rowspans and colspans
  • Partially handles inline images

All other objects or features are left untouched.

WIBNIFs

It would be really nice if somebody could try to improve this Add-On to:
Feature Status
Replace all shapes (images, word art, Word equations, etc) with links to %ATTACHURL%/image1.gif. The numbering should be consistant with the generated image numbering after saving the .doc as 'HTML, filtered' in MS Word. Partially implemented. Images are collected, document is saved as htm. Links are modified. Resized images, word art, and other untested objects are not yet supported
Actually save all shapes from wihtin the macro, so the numbering is known and no guesses need to be made Completed in htm files folder
Detect if a TOC is present in the .doc and replace it with %TOC% Not yet implemented
fix the Known Problems Some fixed

Known Problems

  • Has been known to cause Word to hang altogether (at least partly fixed in version 1.1, please provide feedback and test cases (word docs) if you still notice this)
  • If (lines in) a Table cell is bold or italic, but doesn't actually contain any text, this macro still inserts
     *  * 
    or
     _  _ 
    .
  • Formatted pictures. For some reason, pictures that have been resized inside word are not being recognized as pictures and therefore are not saved in the htm image folder.
  • Pictures inside table cells. They are not kept inside a table cell.
  • Numeric bullet lists inside Table cells. All the numbers get reset to "1"
  • WordArt? textboxes get copied, but appear randomly in the text.
  • It won't keep right formatted images with text on the left.
  • Single paragraph breaks still exist in the converted TWiki source but disappear when TWiki renders the topic. Double (or more) paragraph breaks create a paragraph break in TWiki. A linebreak in word [Shift-Enter] get an additional %BR% during conversion.

Add-On Installation Instructions

Note: Contrary to many TWiki Add Ons this is not installed on the server but is a macro to be installed in MS Word.

  • Download the .BAS file from the Add-on Home (see below)
    • If you have problems downloading the .BAS file, try the .ZIP version.
  • Launch Microsoft Word, go to Tools | Macro | Visual Basic Editor (Alt+F11)
  • right mouse button on the Normal project (within the Project Explorer window - if you don't see that window, go View | Project Explorer (Ctrl-R), do an Insert | Module
  • Use context menu or File menu to select Import file... and pick the downloaded .BAS file.
  • go File | Save Normal, then File | Close and Return to Microsoft Word

Please note that from version 1.400, this macro requires MS Excel in order to handle merged table cells. Since most people/corporations who own Word also own Excel, this decision was believed to be acceptable. If you do not own MS Excel but do want to use this macro, replace the ConvertTable() subfunction with the function in version 1.310 (the Add On will then fail on encountering tables with merged cells).

Add-On Info

  • Set SHORTDESCRIPTION = Visual Basic script to convert a Microsoft Word documents to the TWiki markup language

Add-on Author: TWiki:Main/JosMaccabiani, TWiki:Main/MerlijnVanDeen, TWiki:Main/MikaelOlenfalk, TWiki:Main/PabloCaskey, TWiki:Main/TouseefLiaqat, TWiki:Main/DougClaar, TWiki:Main/MiloValenzuela, TWiki:Main/AlexanderStedile
Add-on Version: 25 Jul 2007 (v1.481)
Change History:  
06 Aug 2007: v1.482: TWiki:Main/AlexanderStedile Bug fixes, uppercase and spacing issues.
26 Jul 2007: v1.481: TWiki:Main/AlexanderStedile fixed converting hyper links by changing conversion step execution order.
25 Jul 2007: v1.480: TWiki:Main/AlexanderStedile added converting text color (named Word/TWiki colors), added converting linebreaks (paragraphs are not touched).
23 Jul 2007: v1.470: TWiki:Main/AlexanderStedile added conversion for underline, typewriter font, combinations with bold. Refactored and removed some copy&paste code.
26 Apr 2007: v1.460: TWiki:Main/DougClaar merged the bugfixes of 1.4.4 back in. They got lost in 1.4.5
25 Feb 2007: v1.450: Added support for inline images by saving the document as htm which collects the images in a folder and fiurther modifies the links to them
26 May 2006: v1.440: Removed the bug in which macro hangs while converting the links and also the order of text and address is corrected. (thanks Touseef)
06 Apr 2006: v1.430: handle nested lists even better (plus small bugfix) (thanks Pablo)
19 Sep 2005: v1.410: More robust and elegant handling of merged cells (thanks Merlijn)
18 Sep 2005: v1.400: Tables with merged cells are supported (thanks Merlijn)
22 Aug 2005: v1.310: Small bugfix, removed double variable declaration in sub ConvertLists.
22 Aug 2005: v1.300: Correct conversion of nested bullet- and numbered lists. (thanks Mikael)
06 Aug 2005: v1.200: Better conversion of bold and italic formatting by correct handling of trailing and leading formatted spaces.
05 Aug 2005: v1.100: Fixes bug where Word hangs if formatting (bold/italics) is applied to the paragraph mark at the end of a line that is contained in a bullet-list.
08 Jul 2005: v1.000: Initial version
CPAN Dependencies: none
Other Dependencies: Requires MS Word and MS Excel (http://www.microsoft.com)
Perl Version: n/a
License: GPL
Add-on Home: http://TWiki.org/cgi-bin/view/Plugins/MsWordToTWikiMLAddOn
Feedback: http://TWiki.org/cgi-bin/view/Plugins/MsWordToTWikiMLAddOnDev
Appraisal: http://TWiki.org/cgi-bin/view/Plugins/MsWordToTWikiMLAddOnAppraisal

Acknowledgments

Version 1.1 of this Add On was more than heavily based on / directly copied from:

Version 1.4.5 handling of inline images is based on a lightly modified code from

Related Topic: TWikiAddOns

-- TWiki:Main/JosMaccabiani - 09 Jul 2005

Topic attachments
I Attachment Action Size Date Who Comment
elsebas Word2TWiki.bas manage 16.9 K 06 Aug 2007 - 08:43 AlexanderStedile Version 1.482 - fixed uppercase, spacing.
zipzip Word2TWiki.zip manage 4.3 K 08 Aug 2007 - 11:26 AlexanderStedile Version 1.482 - fixed uppercase, spacing. - Zipped to work-around download problem.
Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r30 < r29 < r28 < r27 < r26 | More topic actions
 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback SourceForge.net Logo