Tags:
create new tag
, view all tags

MediaWikiToTWikiAddOnDevArchive

Archive of discussion on the MediaWikiToTWikiAddOn

-- MichaelDaum - 29 Dec 2007


There are times when existing MediaWiki sites need to be wholesale converted over into TWiki. I am currently in this situation as we merge two different research departments each having a pre-existing wiki. This conversion exposes some questions as to how to best structure the converted Wiki.

I am presenting what I have done (or will attempt), to handle the conversion. Please feel free to chime in with your comments, suggestions and ideas.

Capturing the MediaWiki pages (topics).

As of version 1.5.6 there exists a page Special:AllPages, which provides the list of names for the selected namespace. There also is a page Special:Export, which given a list of pages produces XML containing the information for each page (owner, timestamp, and markup).

I created a PERL program which filters this output to produce the needed namespace:page names for the MediaWiki Special:Export page. An existing CPAN module is used to read the XML (need to put in name here).

I have another script which uses the module to extract the mediawiki markup into pages, labeled (ala TWiki) into a directory for each namespace. Each of these files can then be converted into TWiki markup, and placed into the "proper" TWiki location. (to be completed).

MediaWiki page (topic) naming

MediaWiki allows for spaces within a topic name, and doesn't require CamelCase capitalization. Our solution is to use a common page-naming routine which consistently renames a mediawiki page into a ProperCased TWiki topic.

Conversion of the MediaWiki markup language

Since both markup's are line oriented, this was a relatively straightforward task of creating a PERL program which reads mediawiki markup and writes out TWiki markup. The mediawiki table format is a richer, and allows for nested table layout, as well as direct placement of table, row and cell specific parameters (like: width, colspan, rowspan, valign, align, etc.) With simplest tables, the standard "pipe" table markup can be used, with some improvement using the %TABLE{}% plugin. Otherwise, direct HTML is produced, which is structured to "pass-thru" any embedded markup (such as bold, italic, bullets etc.).

How to convert the Mediawiki discussion pages

The mediawiki Talk pages exist in a separate namespace. I see the TWiki equivalent handled by simply adding "Talk" as a suffix to the pre-existing topic. (Ex: MediaWikiToTWikiAddOnDev and MediaWikiConversionTalk). The WebLeftBar could potentially display the link to this pre-existing Talk page, or provide a means to create a new Talk page. The %COMMENT{}% plugin could be used on this page to structure the discussion.

How to handle ownership and timestamps of pages

TimeStamps are straightforward, merely touch the file with the proper date. Preserving topic ownership is more thorny, as the Users need to exist before being able to create the page (? Is this true ?)

Converting MediaWiki History

This can easily be achieved using the rcs and ci command line utilities for creating RCS archives. Time/date stamps, ownership, and change log message can be set using command line parameters of ci

Converting MediaWiki Categories

  • Create a web "Category" and create pages which hold the indexes for each catagory. This would capture the existing information, but would be tedious to maintain.

  • Create a page for each of the categories with a name of Category<Category name> (e.g. CategoryHardware) with the following line %SEARCH{"$title" nosearch="on" nosummary="on"}% appended to the category page

Proper placement (and naming) of Image and Media attachments

MediaWiki seems to have two namespaces for this; Image and Media. Again, embedded spaces are handled "transparently", the link will have spaces, while the filename uses "_" (underbar). There may also be a issue with leading capital letters. The link would be lowercase (Image:wiki.png), while the file is named Wiki.png. I have not explored the Media namespace yet.

MediaWiki stores all attachments under a single directory, with sub-directories for (balancing?). The simplest would be to copy each needed attachment to the "proper" location under it's topic. This could create duplicate files, where there existed a single version before. Maybe, a hybrid solution, create a new topic MediaWikiShared which would be the parent for all multiply referenced attachments?

I have created a new plugin ImagePlugin which handles the MediaWiki Image:... formating commands which provide for rich control of placement and sizing. So conversion is done by translating the MediaWiki [Image:wiki.png|120|center] into the TWiki version %Image{"wiki.png|120|center"}%. See Sandbox.ImgPluginEx2 which is based on Wikipedia:Wikipedia:Extended_image_syntax.

This Plugin will be posted after I get done with the conversion.

-- CraigMeyer - 18 Aug 2006

Discussion

The company wants to convert it's mediawiki to twiki. My mediawiki 1.5.6 uses mysql 4.1.11

Is there a way to import the mysqldump file into twiki?

-- ReidMaynard - 28 Jul 2006

Ah, a seldom seen request :-). I think a script like this (originating from the Trac Wiki engine) could be a source of inspiration?

#!/usr/bin/python
#
# This script is provided AS IS, without any warranty!
# Copyright lio@lunesu.com, placed in the public domain
#
import os
 import _mysql
# open the mediawiki sql db
 db = _mysql.connect("localhost","wikiuser","twin","wikidb")
db.query("SELECT cur_title,cur_text from cur where cur_namespace < 3;")
rs = db.use_result()
while 1:
    row = rs.fetch_row()
    if row == ():
        break
    filename = row[0][0]
    wiki = row[0][1]
    # convert mediawiki to tracwiki
    wiki = wiki.replace("\n***","\n   *")
    wiki = wiki.replace("\n**", "\n  *")
    wiki = wiki.replace("\n*",  "\n *")
    wiki = wiki.replace("[[","wiki:")
    wiki = wiki.replace("]]","")
    wiki = wiki.replace("<br>","[[BR]]")
    wiki = wiki.replace("\n:","\n ")
    # todo: change titles?
    # fixme: could use piping to import (no temp files)
    #os.system("trac-admin /tracroot/iv wiki remote" & filename)
    # write to file
    f = open( filename, "w")
    f.write( wiki )
    f.close
# import all wiki pages
 os.system("trac-admin /tracroot/iv wiki load .")
# todo: remove files
 db.close() 

MediaWikiSyntaxPlugin has a short list for ideas to what syntaxes to support in conversion (some source are in EditSyntaxPlugin).

Happy converting! Let us know how your progress is smile

-- SteffenPoulsen - 28 Jul 2006

And please share your converter with the growing TWikiCommunity! How about contributing a MediaWikiToTWikiAddOn converter? See other converters at Tag:import.

-- PeterThoeny - 28 Jul 2006

Any progress with the converter?

-- MatthiasThullner - 16 Aug 2006

Hello,

You are in luck smile I am currently working on a mediawiki => TWiki conversion set off utilities (for version 1.5.6!) written in PERL for work wink Turns out mediawiki using Special:Export, can export the mediawiki markup (with owner, title, etc) in an XML form. There exists a CPAN module Parse::MediaWikiDump which pulls out the needed bits. I have written (and am improving) a mediawiki => TWiki markup language convertor. This handles all the obvious stuff, and I am working on supporting the MediaWiki table mark-up as well. The attachments can be handled by making a tarball of the directory and then placing the needed attachments into the proper TWiki location.

The ImgPlugin (will be ImagePlugin) TWiki plugin %IMAGE{}% current handles all the nice Mediawiki Image placement control features.

I will keep you posted on my progress.

-- CraigMeyer - 16 Aug 2006

Thnak you Craig for working on this, we are looking forward to see your first version posted in the Plugins web.

-- PeterThoeny - 16 Aug 2006

Hi Craig, when will you post your first version. We have also the conversion problem from mediawiki => Twiki. Do you need any help?

-- ElmarBomberg - 17 Aug 2006

Elmar, Here are the relevant bits from the status update I sent to my co-workers.
I am now improving the Convert program as I convert the pages and notice problems. (Like embedded tables)

Potential Issues:

WikiNames for pages (articles)

I will be converting the mediawiki names into TWiki names (with the CamelCase) and no embedded spaces. To handle the Talk pages associated with some pages I will create Topics ending in "Talk" so, "MainPage" will have an associated page "MainPageTalk", if there exists a non-empty Talk page. ToDo: I need to work out the best way to preserve the ownership (author) of the pages.

Catagories: I am still figuring out how to handle the catagories, I need to explore TWiki a bit more to see how to best handle that.

UserNames: ??? Depends on how you want to handle mediawiki usernames.

History (of changes): I am only grabbing the most recent version of each page, all the history will not be preserved.

-- CraigMeyer - 17 Aug 2006

I received email asking what the current state is, and whether I am ready to send it out. I thought I should post my response here.

I have a PERL script which grabs the page names from the output of Special::Allpages, which are hand pasted into Special:Export. I have PERL script which parses the XML output from Special::Export and creates "proper TWiki" names files in a separate directory for each mediawiki name space (ie. main:whatever, talk:whatever, Template:mytemplate, etc). Then I have a PERL script which converts mediawiki markup => TWiki markup. I am currently working on logic to handle nested tables, as well as determining whether the converter should create a simple TWiki table or use the HTML version.

So, the simple answer is no, not yet wink

-- CraigMeyer - 17 Aug 2006

On nested tables: You could create a TWiki table for the outer table so that you get the nice table formatting and sorting feature. The inner tables need to be HTML tables, all on one line.

One issue is multi-line content in a table cell. I do not know how to handle that properly. A workaround is to generate HTML tables if a table cell needs to be on multiple lines (such as for bullets). Alternatively, convert bullets into %BB%.

-- PeterThoeny - 17 Aug 2006

Peter, I was thinking it was the other way around? HTML on the outside, TWiki on the inside??? I will try both ways, see how it works out. It shouldn't be hard to modify, as I create an internal model of the table before outputing.

I am using MediaWiki Help for test cases.

-- CraigMeyer - 17 Aug 2006

Interesting...

I noticed that %TABLE{}% has an undocumented "feature", the caption must be first in the list, or it is ignored?

I am not going to implement converting colspan and rowspan, into || or ^

-- CraigMeyer - 17 Aug 2006

I have attached a screenshot of a sample table conversion. (note: I added a fix after to fix the captions for the wiki.png table)

-- CraigMeyer - 17 Aug 2006

Just noticed that ImagePlugin does exist here smile MichaelDaum made some changes after I posted my early version. There are differences in scope and required support libraries between what Michael modified, and what I created, and am using (at work).

I have not verified whether the mediawiki syntax is still supported in the current TWiki modified version.

The code differences are substantial, so I am not sure how to proceed. Please read ImagePluginDev for more history.

-- CraigMeyer - 18 Aug 2006

Hi, Craig. The ImagePlugin still supports MediaWiki syntax. The only support library needed is CPAN:Image::Magick. Your old version of this plugin was using the command line version of it. So the requirements are not so different in the end.

That aside, I am also very interested in MediaWiki => TWiki conversion. Note, however, that I would be very happy to see just a partial solution as I don't need to cover all MediaWiki features 100% (e.g. nested tables). So even if you did not resolve every issue your contribution will be very valuable. We have a medium size MediaWiki installation and only need a one-shot conversion. Minor manual remastering is OK too as part of the existing data will be rewritten as a proper TWikiApplication anyway.

-- MichaelDaum - 18 Aug 2006

I have put together a converter from MediaWiki to TWiki that does a basic conversion of pages (including their history) and categories. It is attached as mw2twiki.pl. Our MediaWiki installation is PostgreSQL whereas I think most are MySQL. Therefore, you may need to edit the code to get it to work with your installation. I have utilized the regexs at MediawikiEditSyntaxRegex for basic conversion of MediaWiki to TWiki format.

-- JohnSupplee - 28 Aug 2006

I just noticed EditSyntaxPlugin, which could be very useful when migrating, as it allows people who like the 'legacy' wiki syntax to carry on using it, even on topics written from scratch in TWiki's syntax (aka TWikiML or TML). Very impressive!

-- RichardDonkin - 01 Sep 2006

Craig What is the status of your conversion?

-- MichaelMazza - 15 Sep 2006

Thank you for asking . . . wink

Our research group is going through a re-organization, so my priorities have shifted somewhat. (Meaning, I have been working on other things wink I hope to re-focus on the conversion stuff sometime this week. I did create a new %TML% tag, which makes it easy to include TWiki markup within a TWiki table cell without requiring to HTML. I tucked it into another Plugin which I use for useful helpers.

(modified to replace EMBED => TML )

-- CraigMeyer - 16 Sep 2006

Just noticed a EmbedPlugin already exists. This Plugin is used to display MediaPlayer files within a page. What I am describing is different, and so needs a better name TML.

Example usage:

| *Header1* | *Header2* |
| blah blah | %TML{---+++ Header line
   * Item 1
   * Item 2
} % |

-- CraigMeyer - 17 Sep 2006

Well, I spotted this page after I'd already written mine. It's not particularly different in concept or execution from JohnSupplee's below. The main differences are:

  • I don't do the joins in the SQL, so I actually complain when there's an inconsistency.
  • I write the title in a header at the beginning of the page text, like MediaWiki is wont to do automatically.
  • I didn't insert any meta info, because I was lazy.
  • I offer the option to start the page dump at a certain page name, then I use the link pagelinks table to spider though the pages.
  • I change the link contents in the pages if the filename required changing.

-- PaulJStewart - 12 Dec 2006

Nice. It would be helpful to get an initial version of a converter packaged and published at MediaWikiToTWikiAddOn. See AddOnPackageHowTo.

-- PeterThoeny - 13 Dec 2006

I've done a complete conversion package with a plugin interface to for custom data conversion. Will package it soon.

-- MichaelDaum - 14 Dec 2006

Excellent, I am looking forward to seeing your package listed in the Plugins web!

-- PeterThoeny - 14 Dec 2006

For now I created a MediaWikiToTWikiAddOn placeholder topic linking to here.

-- PeterThoeny - 09 Jan 2007

MichaelDaum or JohnSupplee - my company will probably be migrating a Mediawiki installation to TWiki soon. Would either of you be able to provide some instructions for your converter script (including at least some pointers to MySQL documents that would be helpful to newbies)? We seem to have lost CraigMeyer...

-- JohnWorsley - 16 Mar 2007

Hi John wink - not lost, just busy at the day job...We have converted a ~1000 page mediawiki to TWiki.

-- CraigMeyer - 18 Mar 2007

Craig - If you're too busy to upload as a TWiki add-on, maybe you could just attach the source code here?

-- RichardDonkin - 18 Mar 2007

Here is my take on converting mediawiki2twiki. This has been funded by a client of mine to converted a huge pile of technical documentation to TWiki. It operates on an xml export of the mediawiki db and is quite feature-complete. You can even plugin extra conversions. That feature was used to clean up utf8 encoding quirks specific to that client's data only.

-- MichaelDaum - 18 Mar 2007

I moved above text from Codev.MediaWikiConversion to here.

Michael: Thank you very much for sharing this converter with the TWikiCommunity! Could you package it and post it to the MediaWikiToTWikiAddOn topic?

-- PeterThoeny - 18 Mar 2007

Michael: I tried your script out in debug/dry run mode and got an error:

$ perl -I /var/apache/htdocs/twiki/bin tools/mediawiki2twiki.pl --file dump.xml --debug --dry --max 10
DEBUG: opening dump.xml

not well-formed (invalid token) at line 182, column 14, byte 4725 at /usr/perl5/vendor_perl/5.8.4/i86pc-solaris-64int/XML/Parser/Expat.pm line 616
XML::Parser::ExpatNB::parse_more('XML::Parser::ExpatNB=HASH(0x88604bc)', '<mediawiki version="0.1" xml:lang="en">\x{a}<page>\x{a}<title>1323 Ve...') called at /usr/perl5/site_perl/5.8.4/Parse/MediaWikiDump.pm line 226
Parse::MediaWikiDump::Pages::parse_more('Parse::MediaWikiDump::Pages=HASH(0x892605c)') called at /usr/perl5/site_perl/5.8.4/Parse/MediaWikiDump.pm line 195
Parse::MediaWikiDump::Pages::init('Parse::MediaWikiDump::Pages=HASH(0x892605c)') called at /usr/perl5/site_perl/5.8.4/Parse/MediaWikiDump.pm line 42
Parse::MediaWikiDump::Pages::new('Parse::MediaWikiDump::Pages', 'dump.xml') called at tools/mediawiki2twiki.pl line 234
Converter::new('Converter', 'webMapString', '', 'plugin', '', 'topicMapString', '', 'maxPages', 10, ...) called at tools/mediawiki2twiki.pl line 179
Converter::main() called at tools/mediawiki2twiki.pl line 1200
XML::Parser::ExpatNB::parse_more('XML::Parser::ExpatNB=HASH(0x88604bc)', '<mediawiki version="0.1" xml:lang="en">\x{a}<page>\x{a}<title>1323 Ve...') called at /usr/perl5/site_perl/5.8.4/Parse/MediaWikiDump.pm line 226
Parse::MediaWikiDump::Pages::parse_more('Parse::MediaWikiDump::Pages=HASH(0x892605c)') called at /usr/perl5/site_perl/5.8.4/Parse/MediaWikiDump.pm line 195
Parse::MediaWikiDump::Pages::init('Parse::MediaWikiDump::Pages=HASH(0x892605c)') called at /usr/perl5/site_perl/5.8.4/Parse/MediaWikiDump.pm line 42
Parse::MediaWikiDump::Pages::new('Parse::MediaWikiDump::Pages', 'dump.xml') called at tools/mediawiki2twiki.pl line 234
Converter::new('Converter', 'webMapString', '', 'plugin', '', 'topicMapString', '', 'maxPages', 10, ...) called at tools/mediawiki2twiki.pl line 179
Converter::main() called at tools/mediawiki2twiki.pl line 1200

I have version 2.34 of XML::Parser::Expat, which is the current one. I don't know enough Perl to tell whether the problem is with the script or the XML I got from Mediawiki's export. We're running Mediawiki 1.4.7. The XML output had extra stuff in it designed to facilitate navigating the output via the browser, and I removed all of that so that each tag starts at the beginning of a line.

-- JohnWorsley - 19 Mar 2007

Mediawiki-1.4's xml export is a very early stage and basically totally broken. You need to upgrade to mediawiki-1.5 or higher first. These are the steps to do it.

  1. create an mysql dump
  2. install a mediawiki-1.5
  3. import the sql dump
  4. run the maintenance/update.php script to upgrade the database schemata from 1.4 to 1.5

Only then the produced dump.xml is valid enuf to be processed by XML::Parser::Expat.

-- MichaelDaum - 20 Mar 2007

I installed MW 1.6.9 and (in short) created a copy of the 1.4.7 database to point 1.6.9 at. I ran through the MW browser configure process, and ran the maintenance/update.php script. All seemed to go well, and I can now view the same content in 1.6.9.

But when I run mediawiki2twiki on the 1.6.9 XML dump, I get the same error I posted above; the only real difference seems to be the first line, which is now: not well-formed (invalid token) at line 294, column 7, byte 7457 at /usr/perl5/vendor_perl/5.8.4/i86pc-solaris-64int/XML/Parser/Expat.pm line 616
Any ideas what to try next?

I should make sure I'm doing the XML dump right: the output I get from Special:Export starts with a line "This XML file does not appear to...", and the XML display has dashes to the left of some tags so you can hide sections. I have been assuming all I want is the XML tags, so I copy the page content from the browser and save it to a file, then remove the header and the dashes and the tabs that indent some lines; then all I have in the file is lines that start with XML tags. Is that the right way to go?

-- JohnWorsley - 23 Mar 2007

No. Please go to the maintenance subdirectory of your mediawiki installation and execute

php5 dumpBackup.php --current > dump.xml
This script dumps the wiki database into an XML interchange wrapper format for export or backup.

The first line of the xml file should be something like this:

<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.3/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.3/ http://www.mmediawiki xmlns="http://www.mediawiki.org/xml/export-0.3/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.3/ http://www.mediawiki.org/xml/export-0.3.xsd" version="0.3" xml:lang="de">ediawiki.org/xml/export-0.3.xsd" version="0.3" xml:lang="de">

Don't use the browser interface to create the xml backup because there might occur any kind of unwanted encoding problems when the data gets loaded and copied from and to the browser depending on the browser you are using.

-- MichaelDaum - 26 Mar 2007

Thanks Michael. I wound up having to touch dump.xml first because for some reason, when I ran the command you provided, bash complained that dump.xml didn't exist. Aside from that, I got the file and ran your script on it, and all seems to have gone well.

Happily our Mediawiki users haven't done anything more complicated than tables, so once I installed MediaWikiTablePlugin, the imported MW files look good. The only thing I noticed that didn't get handled: links to topics that don't exist yet were not preserved.

-- JohnWorsley - 27 Mar 2007

Here, for the benefit of other folks who want to use Michael's mediawiki2twiki script, is a summary of the process I went through to convert our Mediawiki to TWiki:

  1. Install Parse::MediaWikiDump on the TWiki server so the conversion script will work.
  2. Upgrade our MediaWiki to version 1.6.9 in order to get a valid XML dump (ah, the irony).
  3. Run php dumpBackup.php --current > mediawiki-dump.xml from the MW maintenance directory to produce the XML dump.
  4. Transfer the XML dump file to the TWiki server, and do the remaining steps there.
  5. Run perl -I bin tools/mediawiki2twiki.pl --file mediawiki-dump.xml --debug --dry --max 10 from the base twiki directory to do a test run and make sure it's basically working.
  6. Run perl -I bin tools/mediawiki2twiki.pl --file mediawiki-dump.xml --topicmap 'Main Page=WebHome' --debug from the base twiki directory to do the actual conversion. This is what creates all the TWiki topics, one per MW page. The script does contain some brief documentation of the available parameters, so have a peek inside to see if you want to use any of them.

I also chose to install several TWiki plugins to better support the converted MW topics:

Here's the full list of CPAN modules required (some of them may have their own dependencies): TWiki::Time, Digest::MD5, File::Copy, Getopt::Long, Pod::Usage, Parse::MediaWikiDump, Encode, Carp

Our MW doesn't have anything fancier than tables - no images, no custom namespaces, no templates, etc. - so I have no idea how well the script would work for stuff like that. But for ours, the only thing it didn't handle was links to non-existent pages; it converted them into text.

-- JohnWorsley - 28 Mar 2007

Thanks for sharing this John, great feedback "from the trenches". Does anybody has time to document and package this properly?

-- PeterThoeny - 29 Mar 2007

Topic revision: r1 - 2007-12-29 - MichaelDaum
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.