Convert MS Word Document to TWiki Markup (Window OS Only)
Function
- Use VBA to convert multiple MS word documents to HTML files. Use perl script to convert any number of html files in any nember of sub-folders to TWiki markup text files (Use recursive search).
Usage
TOC
Step 1: Convert Word Document to HTML File
- Open "!ConvertWordToHTML.doc" from C:\Word2TWiki. (If the buttons do not work, please enable macro first.)
- Use one of the options to convert word documents to HTML files, which are saved in C:\Word2TWiki\DocHTML.
- Click "Convert All Word Documents From Folder" button if you have copied word documents to the C:\Word2TWiki\DocWord.
- Click "Convert All Opened Files (Exclude Me)" button if you have opened all word documents.
Step 2: Convert HTML File to TWiki Markup
From command line:
>perl "HTML(MSWord)2TWikiMarkup.pl" [/c][/a][/v][/i|h|help]
/[c] Copyright
/[a] Author
/[v] Version /[i|h|help] Generate this document in
TWikiML
Or double click HTML(MSWord)2TWikiMarkup.pl" from C:Word2TWiki.
- TWiki Markup text files ("_FileTitle.txt") are saved in twiki_FileTitle sub-folder in the C:\Word2TWiki\DocHTML.
- In the folder, there is a zipped image file that contains all images.
Step 3: Proof Read TWiki Markup Documents.
- Open text file "_FileTitle.txt" in twiki_FileTitle sub-folder in the C:\Word2TWiki\DocHTML.
- Edit title, author and issue date as necessary.
- Remove some lines in the beginning.
- Proof read the file and correct any errors.
Features
TOC
- Headings. User can change heading levels to be included in the TOC in configuration file.
- Italic, underline, bold, bold italic.
- html link, email link and cross reference.
- Unicode for symbols. (Refer to Unicode alternatives for Greek and special characters in HTML
)
- Text color.
- Simple bullet lists (" * "), number lists (" 1. " and " a. "). Levels are preserved if they are detectable.
- Tables with or without rowspans and colspans.
Other HTML Tags
- Other HTML tags are preserved but only kept minimum. Refer to the section [Preserve tags] in configuration file, user can add or delete tags from the list.
- Support custom bullet and number list styles. Refer to the section [CustomerStyleDefinition] in configuration file.
Perl Script Feature
- Images are zipped. No images are lost.
- Recursive processes HTML files in sub-folders.
- Create well formatted and clean text, minimum proof reading and correction.
- Hidden text removed.
- Special text removed. These texts are defined in the section [Text To Be Removed] in configuration file. User should edit this section.
- Title, author and date are collected. Title must be in the first few lines, then followed by authors and date.
Software Requirement - ActivePerl for Windows
TOC
Perl Script and ConvertWordToHTML.doc Installation
- Download MsWordToTWikiOnWindowsAddOn.zipfile.
- Open MsWordToTWikiOnWindowsAddOn.zip and extract all files including directory. Following files are installed:
| File |
Directory |
Comment |
| ConvertWordToHTML.docm |
C:\Word2TWiki |
Used for MSOffice Word 2007. |
| ConvertWordToHTML.doc |
C:\Word2TWiki |
Used for previous MSOffice Word. |
| HTML(MSWord)2TWikiMarkup.pl |
C:\Word2TWiki |
Script used to convert HTML (MS Word) document to TWiki markup. |
| _HTML(MSWord)2TWikiMarkup.ini |
C:\Word2TWiki |
Configuration file is used to control perl script execution. User must edit this file before using perl script. |
Requirement to Word Document
TOC
To reduce conversion error, reduce proof read and edit time, the word document should use these word features
- Heading.
- Caption.
- Bullet and number List.
- Endnote and footnote.
- Cross Reference.
- Do not insert table within table cell.
- Do not add free shapes on existing image.
- Word drawing including text boxes has to be reformted as gif, jpg, bmp, png etc.
Add-On Info
- Set SHORTDESCRIPTION = Convert MS Word Document to TWiki Markup (Window OS Only)
--
CharlieMao - June 23, 2009