Tags:
create new tag
, view all tags

ExtractingTopicDataToADatabase

-- AlanBurlison - 20 May 2004

Summary

I've been using TWiki for a workflow-type application, and I wanted to create some summary report pages to track progress, look for missing information and spot errors. I also needed to be able to extract key information (e.g filenames and status) for processing in another application, and although there are plugins that give database-like access to the TWiki topic data (e.g. ADatabaseCalledTWiki) they crucially don't allow external applications to access the TWiki data using SQL. I rapidly hit the limits of the TWiki %SEARCH{...}% mechanism, especially when dealing with repeating groups in a topic (e.g. lists of files), so I ended up extracting the key information into a set of tables in a SQLite database using the perl DBI interface. This was really remarkably easy to do, and allowed me to combine the unstructured-but-flexible nature of a Wiki with the structured-but-inflexible attributes of a database. I know there's been much discussion of the interplay between TWiki and databases, so I thought I'd write up my experiences in the hope that they'd help others and stimulate some discussion.

Related topics

WorkFlow, WorkFlowAddOn, TwikiMetadata, DatabaseForMETA, MetadataSucks, PluginDataStore, TWikiWhatWillYouBeWhenYouGrowUp, PerlDBI

Example topic data

I used forms and templates along with a series of customised 'create a new topic' forms to impement 'typing' of topics. One of the topic types that I wanted to query against contained a list of filenames, and I wanted to extract each filename into a seperate row in a datbase. The raw topic text looks something like:

---+ <nop>%TOPIC%

---++ Owner

Fred Bloggs

---++ Summary

This topic blah, blah, blah....

---++ Files

/for/bar/baz %BR%
/some/other/file %BR%
/yet/another/file %BR%
/and/so/on/and/so/forth %BR%

Database mapping of topic data

For the particular topic type shown in the example above, the web name, topic name, the owner, summary block text, the topic metadata and the form data would appear as one row in a table (table1), and for each filename a row containing (web, topic, filename) would appear in a detail table (table2). There are several other topic types which map onto different table schemas, depending on the contents and the presence of repeating groups etc. All the various tables use as a minumum (web, topic) as the primary keys.

Query interface

I'll show the query interface first, then explain how I got the data into a database. I tried to keep the syntax of the queries as close as possible to that of %{SEARCH{..}% variable. A typical query embedded in a page looks like this:

%QUERY{"select t1.owner, t1.topic, t2.filename from table1 t1, table2 t2 where t1.web = 'Myweb' and t1.web = t2.web and t1.topic
= t2.topic order by 1, 2, 3" break="1" header="---+++ Owner: #1\n| <nop>*Topic* | <nop>*Filename* |" format="| #2 | #3 |"}}%

The query attribute specifies the SQL to be used to query the database, the break attribute specifies the report break column (if required), the header attribute specified the header to be used on the table/break block, and the format attribute allows you to specify the format for each table row. In the header and format attributes each #n is replaced with the appropriate column from the query result. The query in the example above would expand to something along the lines of:

Owner: Fred Bloggs

Topic Filename
TopicOne /for/bar/baz
TopicOne /some/other/file
TopicOne /yet/another/file
TopicOne /and/so/on/and/so/forth
TopicTwo /somewhere/else
TopicTwo /dev/null

Owner: Peter Thoeny

Topic Filename
TopicThree /twiki/src/code

And so on.

Data extraction from TWiki

This is done using a custom plugin, and by providing a beforeSaveHandler that looks at the form type of the topic and passes control to the appropriate per-topic-type handler, e.g.

   if ($_[0] =~ m{%META:FORM{name="TypeOneForm"}%}) {
      scan_type_one(\$_[0], $Topic);
   } elsif ($_[0] =~ m{%META:FORM{name="TypeTwoForm"}%}) {
      scan_type_two(\$_[0], $Topic);
   }

The individual type handlers then delete any existing rows for the topic from the appropriate tables, parse the topic text and insert the new rows into the tables.

Query rendering

The query rendering is done in a startRenderingHandler. The code is relatively simple, the bulk of it being concerned with parsing the query parameters.

Other issues

The only wrinkle is that TWiki doesn't provide a 'rename topic' handler, so if a topic is deleted the rows still appear in the database. I worked around this by writing a cron job that once a day scans all the topics in the Web, rebuilds the tables and deletes the rows for topics that no longer exist.


Alan, this is very interesting. Is this plugin (plus some sample code interacting with the SQL database) something you could share with the wider community?

-- ThomasWeigert - 20 May 2004

The plugin has lots of proprietary goop in it so I can't share the whole thing, but I can post the bits that do the database stuff. Unfortunately it's not completely generic as it knows the layout of the various template topics, but some of the stuff (e.g. the query handler) could be easily resused. I'll attach the stuff to this topic later on. -- AlanBurlison - 20 May 2004 - 11:34

-- AlanBurlison - 20 May 2004

You may want to have a look at FormQueryPlugin, which caches form data in a database for rapid queries using a cut-down SQL.

-- CrawfordCurrie - 20 May 2004

Thanks, and I had looked at it, but it is a) limited to form data, b) isn't full SQL and c) doesn't allow for the data to be queried outside of TWiki, which are my primary requirements.

-- AlanBurlison - 20 May 2004

OK. Just one clarification; the FormQueryPlugin is not limited to form data. It reads structured tables from the topic, and also stores the topic text. Points (b) and (c) are both good ones! This is a great idea, and I'm interested in finding out more, especially how you organised the database. I have wanted to see the TWiki Store implemented in a database (as an alternative to the text form) for some time. Caching the text DB into an SQL database on the fly is an intriguing idea that had not occurred to me.

  • What DB did you use?
  • How did you cope with new form types being added?
  • And how did you do the SQL? Did you parse the query in the perl (e.g. via SQL::Statement), or throw it straight at the DB?
    • If in the perl, did you use one of the CPAN SQL parsers?
(Just FYI, SharedCode includes a standard parameter parser module - Attrs.pm - that might have saved you some code, though it wouldn't help with parsing SQL)
  • Do you have any comments on the performance of the beforeSaveHandler?
  • Does the %QUERY writer have access to queries already cached in the DB?

-- CrawfordCurrie - 21 May 2004

This looks very similar to the latest setup I have at my company.

I wrote a Plugin that is a wrapper around SQLite using Perl DBI (basically to reduce some common code to a single function call), and another Plugin to store form metadata in the database.

What I did was to create (dynamically) one table per form. A sample DDL is as follows:

CREATE TABLE TicketLifecycleForm (EntryType TEXT,Summary TEXT,TicketPriority TEXT,AffectsRelease TEXT,Description TEXT,Date TEXT,web TEXT, topic TEXT);
Also, an index/PK over (web,topic) is created.

The data is extracted and saved on the afterSaveHandler handler. I did this because the META info may be changed on the beforeSaveHandler handler. If the table does not exist, it is created. The plugin also provides a rest action to process a whole web.

My current implementation has some limitations:

  • It is not possible to two forms with the same name and different structure on different webs.
  • If the topic has some FORMFIELDs that is not in the form definition, the insert will fail.
  • The plugin is not managing modifications in the structure of the Form.

If someone is interested I can take time to clean up the code a little and publish them to SVN.

-- RafaelAlvarez - 18 Aug 2008

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r8 - 2008-08-18 - RafaelAlvarez
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.