NmConvertTextToTWiki < Wikilearn

Tags: view all tags

Expanding on the Nedit macro in Nm Undo Hard Word Wrap, this macro aims to do a lot of the conversion of the text format of "Dive Into Python" to TWiki markup.

Unfortunately, I think many similar conversions will be just different enough that this macro will have to be tweaked for every document, but maybe after I do enough of them (do I want to), I will recognize some patterns.

One gotcha for this file is the \xa0 that is used as (2) spaces in front of the bullets (asterisks) — we need to convert the \xa0's to ordinary spaces (\x20) and increase the quantity to 3. Other problems are the TOC (which I will delete and replace with a TWiki TOC (manually) and URLs then span a line break (is there really a \n there, or is it just a "continuous" break?).

Update: \xa0 is (1) a non-breaking space, and (2) used in quite a few places in the file. I think to make the "core" parts of this macro more general / reusable, I will convert all \xa0's to ordinary spaces before other processing. I will also create macros to reinsert \xa0's when appropriate, for example, a macro that creates a TWiki multi-word definition markup — given that the multi-word phrase is selected, it replaces all spaces within that selection with \xa0's, then inserts three spaces before it, a colon and space after it, and deletes the next newline so that the following line is treated as the definition. (Maybe it could even be smart enough to take one of several actions, depending on whether there was additional text (the definition) on the same line or not.)

Update: More problems to consider — when a bullet item exceeds a single line, the text is hard wrapped (\n) and the next line is indented five spaces. I suppose I could always delete spaces at the beginning of a line (except for leading spaces before an asterisk, for example), but I probably need to usually replace those leading spaces with a single space, sometimes with no spaces (URL broken across lines), and sometimes with double spaces (previous line ended a sentence). What a pain — I may continue with my cut and paste followed by manual fixup for the time being &mdash the major problem in that being that links with hidden URLs don't carry the URL over in the cut and paste.

See:

General Comments
The Normal Approach (Pseudo Code)
Dealing With Bullets
- Preparation
Combined Pseudocode
Let's Add Code
Contributors
Revision Comment
Page Ratings

General Comments

Troubleshooting is easier if you use "replace_in_selection" to deal with portions of the document at a time. When you're done, don't bother to convert to "replace_all", simply add a select_all() before the macro, and a deselect_all() after the macro (or select the entire document before running the macro).

A recorded keystroke tidbit — note the double escapes:

find("^\\xa0{2}\\*", "forward", "regex", "wrap")

The Normal Approach (Pseudo Code)

For a hard wrapped document without complications, like bullets (which can be single or double spaced)

Delete extraneous whitespace from otherwise blank lines
Convert all blank lines (\n\n) to some otherwise unused string ("@@@@@@@@@@")
Delete all remaining newlines (\n) and replace with, hmm, it depends on the situation, doesn't it — could be a (1) space (within a sentence), two spaces after a sentence, or maybe no spaces if a URL is broken by a \n — I'll think about this some more
Restore all blank lines by replacing "@@@@@@@@@@" with \n\n, oops, make that "@@@@@" with \n to handle single and double spaced bullets
Is there any cleanup to do? Restore (or set) cursor? Unselect document? Delete leading and trailing spaces?

Dealing With Bullets

Preparation

Normalize
1. Convert all "\xa0" to " "
2. Convert "\n * " (2 spaces) to "\n * " (3 spaces)

Are there other similar cases?

I think double spaced bullets will be handled properly by the "Normal Approach" — is that true?

For single spaced bullets (i.e., "/n *" but not "/n/n *" — oops, looks like I'll have to get the "/n/n *"s out of the way first, maybe by doing the /n/n -> @@@@@@@@@@ conversion first)
Then replace "/n *" with "@@@@@ *"

Combined Pseudocode

#select_all()

Normalize
1. Convert all "\xa0" to " "
2. Convert "\n * " (2 spaces) to "\n * " (3 spaces)
Delete extraneous whitespace from otherwise blank lines
Convert all blank lines (\n\n) to some otherwise unused string ("@@@@@@@@@@")
Replace "/n *" with "@@@@@ *"
Delete all remaining newlines (\n) and replace with, hmm, it depends on the situation, doesn't it — could be a (1) space (within a sentence), two spaces after a sentence, or maybe no spaces if a URL is broken by a \n — I'll think about this some more
Restore all blank lines by replacing "@@@@@@@@@@" with \n\n, oops, make that "@@@@@" with \n to handle single and double spaced bullets
Is there any cleanup to do? Restore (or set) cursor? Unselect document? Delete leading and trailing spaces?

Let's Add Code

#select_all()

Normalize
1. Convert all "\xa0" to " "
2. Convert "\n * " (2 spaces) to "\n * " (3 spaces)

#replace_in_selection("\\xa0", " ", "regex")
#replace_in_selection("^  * ", "^   * ", "regex")

Optional extra cleanup — remove all extra spaces to the left

#replace_in_selection("^\\s*", "", "regex")

Delete extraneous (white?)space from otherwise blank lines

#replace_in_selection("^\\s+$", "", "regex")

Convert all blank lines (\n\n) to some otherwise unused string ("@@@@@@@@@@")

#replace_in_selection("^$", "@@@@@@@@@@", "regex")

Replace "/n *" with "@@@@@ *"

#replace_in_selection("^\\xa0{2}\\*", "@@@@@ \*", "regex")

Delete all remaining newlines (\n) and replace with, hmm, it depends on the situation, doesn't it — could be a (1) space (within a sentence), two spaces after a sentence, or maybe no spaces if a URL is broken by a \n — I'll think about this some more

#replace_in_selection("\\n", " ", "regex")

Restore all blank lines by replacing "@@@@@@@@@@" with \n\n, oops, make that "@@@@@" with \n to handle single and double spaced bullets

#replace_in_selection("@@@@@", "\\n", "regex")

Is there any cleanup to do? Restore (or set) cursor? Unselect document? Delete leading and trailing spaces?

Optional extra clean up — extra blank space to the right and left of text

#replace_in_selection("^\\s*", "", "regex") #replace_in_selection("\\s*$", "", "regex")

#deselect_all() #beginning_of_file()

Contributors

() RandyKramer - 06 Sep 2003
If you edit this page: add your name here; move this to the next line; and if you've used a comment marker (your initials in parenthesis), include it before your WikiName.

Revision Comment

%DATE% —

Page Ratings

WebForm
PageStatus	Scribbles

Topic revision: r5 - 2005-04-06 - RandyKramer

Edit
Attach

Copyright � 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding WikiLearn? WebBottomBar">Send feedback
See TWiki's New Look

Contents