import1Add my vote for this tag windows1Add my vote for this tag create new tag
, view all tags

Convert MS Word Document to TWiki Markup (Window OS Only)


  • Use VBA to convert multiple MS word documents to HTML files. Use perl script to convert any number of html files in any nember of sub-folders to TWiki markup text files (Use recursive search).



Step 1: Convert Word Document to HTML File

  • Open "!ConvertWordToHTML.doc" from C:\Word2TWiki. (If the buttons do not work, please enable macro first.)
  • Use one of the options to convert word documents to HTML files, which are saved in C:\Word2TWiki\DocHTML.
    • Click "Convert All Word Documents From Folder" button if you have copied word documents to the C:\Word2TWiki\DocWord.
    • Click "Convert All Opened Files (Exclude Me)" button if you have opened all word documents.

Step 2: Convert HTML File to TWiki Markup

From command line:

>perl "HTML(MSWord)2TWikiMarkup.pl" [/c][/a][/v][/i|h|help]

/[c] Copyright

/[a] Author

/[v] Version /[i|h|help] Generate this document in TWikiML

Or double click HTML(MSWord)2TWikiMarkup.pl" from C:Word2TWiki.

  • TWiki Markup text files ("_FileTitle.txt") are saved in twiki_FileTitle sub-folder in the C:\Word2TWiki\DocHTML.
  • In the folder, there is a zipped image file that contains all images.

Step 3: Proof Read TWiki Markup Documents.

  • Open text file "_FileTitle.txt" in twiki_FileTitle sub-folder in the C:\Word2TWiki\DocHTML.
  • Edit title, author and issue date as necessary.
  • Remove some lines in the beginning.
  • Proof read the file and correct any errors.



TWikiML Features

  • Headings. User can change heading levels to be included in the TOC in configuration file.
  • Italic, underline, bold, bold italic.
  • html link, email link and cross reference.
  • Unicode for symbols. (Refer to Unicode alternatives for Greek and special characters in HTML)
  • Text color.
  • Simple bullet lists (" * "), number lists (" 1. " and " a. "). Levels are preserved if they are detectable.
  • Tables with or without rowspans and colspans.

Other HTML Tags

  • Other HTML tags are preserved but only kept minimum. Refer to the section [Preserve tags] in configuration file, user can add or delete tags from the list.
  • Support custom bullet and number list styles. Refer to the section [CustomerStyleDefinition] in configuration file.

Perl Script Feature

  • Images are zipped. No images are lost.
  • Recursive processes HTML files in sub-folders.
  • Create well formatted and clean text, minimum proof reading and correction.
  • Hidden text removed.
  • Special text removed. These texts are defined in the section [Text To Be Removed] in configuration file. User should edit this section.
  • Title, author and date are collected. Title must be in the first few lines, then followed by authors and date.

Software Requirement - ActivePerl for Windows


Perl Script and ConvertWordToHTML.doc Installation

  • Download MsWordToTWikiOnWindowsAddOn.zipfile.
  • Open MsWordToTWikiOnWindowsAddOn.zip and extract all files including directory. Following files are installed:

File Directory Comment
ConvertWordToHTML.docm C:\Word2TWiki Used for MSOffice Word 2007.
ConvertWordToHTML.doc C:\Word2TWiki Used for previous MSOffice Word.
HTML(MSWord)2TWikiMarkup.pl C:\Word2TWiki Script used to convert HTML (MS Word) document to TWiki markup.
_HTML(MSWord)2TWikiMarkup.ini C:\Word2TWiki Configuration file is used to control perl script execution. User must edit this file before using perl script.

Requirement to Word Document


To reduce conversion error, reduce proof read and edit time, the word document should use these word features

  • Heading.
  • Caption.
  • Bullet and number List.
  • Endnote and footnote.
  • Cross Reference.
  • Do not insert table within table cell.
  • Do not add free shapes on existing image.
  • Word drawing including text boxes has to be reformted as gif, jpg, bmp, png etc.

Add-On Info

  • Set SHORTDESCRIPTION = Convert MS Word Document to TWiki Markup (Window OS Only)

Add-on Author: TWiki:Main.CharlieMao
Copyright: © 2009 TWiki:Main.CharlieMao
License: GPL (GNU General Public License)
Add-on Version: 2009-06-23 (V1.000)
Change History:  
2009-06-23: Initial version
TWiki Dependency: $TWiki::Plugins::VERSION 1.1 (TWiki 4.0)
CPAN Dependencies: none
Other Dependencies: none
Perl Version: 5.005
Add-on Home: http://TWiki.org/cgi-bin/view/Plugins/MsWordToTWikiOnWindowsAddOn
Feedback: http://TWiki.org/cgi-bin/view/Plugins/MsWordToTWikiOnWindowsAddOnDev
Appraisal: http://TWiki.org/cgi-bin/view/Plugins/MsWordToTWikiOnWindowsAddOnAppraisal

-- CharlieMao - June 23, 2009

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r8 - 2013-10-16 - PeterThoeny
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.