Convert MS Word Document to TWiki Markup (Window OS Only)
- Use VBA to convert multiple MS word documents to HTML files. Use perl script to convert any number of html files in any nember of sub-folders to TWiki markup text files (Use recursive search).
Step 1: Convert Word Document to HTML File
- Open "!ConvertWordToHTML.doc" from C:\Word2TWiki. (If the buttons do not work, please enable macro first.)
- Use one of the options to convert word documents to HTML files, which are saved in C:\Word2TWiki\DocHTML.
- Click "Convert All Word Documents From Folder" button if you have copied word documents to the C:\Word2TWiki\DocWord.
- Click "Convert All Opened Files (Exclude Me)" button if you have opened all word documents.
Step 2: Convert HTML File to TWiki Markup
From command line:
>perl "HTML(MSWord)2TWikiMarkup.pl" [/c][/a][/v][/i|h|help]
/[v] Version /[i|h|help] Generate this document in TWikiML
Or double click HTML(MSWord)2TWikiMarkup.pl" from C:Word2TWiki.
- TWiki Markup text files ("_FileTitle.txt") are saved in twiki_FileTitle sub-folder in the C:\Word2TWiki\DocHTML.
- In the folder, there is a zipped image file that contains all images.
Step 3: Proof Read TWiki Markup Documents.
- Open text file "_FileTitle.txt" in twiki_FileTitle sub-folder in the C:\Word2TWiki\DocHTML.
- Edit title, author and issue date as necessary.
- Remove some lines in the beginning.
- Proof read the file and correct any errors.
- Headings. User can change heading levels to be included in the TOC in configuration file.
- Italic, underline, bold, bold italic.
- html link, email link and cross reference.
- Unicode for symbols. (Refer to Unicode alternatives for Greek and special characters in HTML)
- Text color.
- Simple bullet lists (" * "), number lists (" 1. " and " a. "). Levels are preserved if they are detectable.
- Tables with or without rowspans and colspans.
Other HTML Tags
- Other HTML tags are preserved but only kept minimum. Refer to the section [Preserve tags] in configuration file, user can add or delete tags from the list.
- Support custom bullet and number list styles. Refer to the section [CustomerStyleDefinition] in configuration file.
Perl Script Feature
- Images are zipped. No images are lost.
- Recursive processes HTML files in sub-folders.
- Create well formatted and clean text, minimum proof reading and correction.
- Hidden text removed.
- Special text removed. These texts are defined in the section [Text To Be Removed] in configuration file. User should edit this section.
- Title, author and date are collected. Title must be in the first few lines, then followed by authors and date.
Software Requirement - ActivePerl for Windows
Perl Script and ConvertWordToHTML.doc Installation
- Download MsWordToTWikiOnWindowsAddOn.zipfile.
- Open MsWordToTWikiOnWindowsAddOn.zip and extract all files including directory. Following files are installed:
|| Used for MSOffice Word 2007.
|| Used for previous MSOffice Word.
|| Script used to convert HTML (MS Word) document to TWiki markup.
|| Configuration file is used to control perl script execution. User must edit this file before using perl script.
Requirement to Word Document
To reduce conversion error, reduce proof read and edit time, the word document should use these word features
- Bullet and number List.
- Endnote and footnote.
- Cross Reference.
- Do not insert table within table cell.
- Do not add free shapes on existing image.
- Word drawing including text boxes has to be reformted as gif, jpg, bmp, png etc.
- Set SHORTDESCRIPTION = Convert MS Word Document to TWiki Markup (Window OS Only)
- June 23, 2009