Use VBA to convert multiple MS word documents to HTML files. Use perl script to convert any number of html files in any nember of sub-folders to TWiki markup text files (Use recursive search).
Open "!ConvertWordToHTML.doc" from C:\Word2TWiki. (If the buttons do not work, please enable macro first.)
Use one of the options to convert word documents to HTML files, which are saved in C:\Word2TWiki\DocHTML.
Click "Convert All Word Documents From Folder" button if you have copied word documents to the C:\Word2TWiki\DocWord.
Click "Convert All Opened Files (Exclude Me)" button if you have opened all word documents.
Step 2: Convert HTML File to TWiki Markup
From command line:
>perl "HTML(MSWord)2TWikiMarkup.pl" [/c][/a][/v][/i|h|help]
/[c] Copyright
/[a] Author
/[v] Version /[i|h|help] Generate this document in TWikiMLOr double click HTML(MSWord)2TWikiMarkup.pl" from C:Word2TWiki.
TWiki Markup text files ("_FileTitle.txt") are saved in twiki_FileTitle sub-folder in the C:\Word2TWiki\DocHTML.
In the folder, there is a zipped image file that contains all images.
Step 3: Proof Read TWiki Markup Documents.
Open text file "_FileTitle.txt" in twiki_FileTitle sub-folder in the C:\Word2TWiki\DocHTML.
Simple bullet lists (" * "), number lists (" 1. " and " a. "). Levels are preserved if they are detectable.
Tables with or without rowspans and colspans.
Other HTML Tags
Other HTML tags are preserved but only kept minimum. Refer to the section [Preserve tags] in configuration file, user can add or delete tags from the list.
Support custom bullet and number list styles. Refer to the section [CustomerStyleDefinition] in configuration file.
Perl Script Feature
Images are zipped. No images are lost.
Recursive processes HTML files in sub-folders.
Create well formatted and clean text, minimum proof reading and correction.
Hidden text removed.
Special text removed. These texts are defined in the section [Text To Be Removed] in configuration file. User should edit this section.
Title, author and date are collected. Title must be in the first few lines, then followed by authors and date.