Tags:
create new tag
, view all tags

How to Export TWiki Data in CSV Format for Use in Excel

2011-11-11 - 21:19:10 by PeterThoeny in Development
Get Involved!
TWiki is an open source project with 10+ years of history, built by a team of volunteers from around the world, and used by millions of people in over 100 countries. The community is focusing on building the best collaboration platform for the workplace. We invite you to get involved!
Sometimes it is desirable to export TWiki forms based data from TWiki applications. CSV (comma-separated_values) is a simple file format that is widely supported by many applications, including spreadsheet programs such as Excel. This blog introduces the CSV format and explains how to generate a CSV file on the fly in a TWiki application.

The CSV format is not clearly defined, there is no real CSV standard. In common usage almost any delimiter-separated text data may be referred to as a CSV file. Traditionally, however, lines in the text file represent rows in a table, and commas separate the columns. This traditional understanding is defined in RFC 4180, which is the best known effort to formalize a CSV standard.

In its most basic form, each record is a line, and fields are separated by a comma. Here is an example that contains a header row representing the field names, and two records:

First name,Last name,Age
Jimmy,Neutron,14
Mickey,Mouse,16

Fields may contain commas, double-quotes, and line breaks (CRLF); in which case they should be enclosed in double-quotes. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double-quote. Example:

Field A,Field B,Field C,Field D
normal field,"field, with, commas","field with
line feed","field with ""quoted"" text"

Now that we understand the basics of CSV, let's see how we can generate a CSV file from TWiki content. As long as content is structured using TWiki forms it is easy to generate reports on the fly using a SEARCH variable. The search can be formatted to produce output in CSV format.

Let's look at the ContactDbAddOn, which is a contact database. Each record is a TWiki page that has a ContactForm with fields Salutation, Name, Job Title, Company, Phone, Mobile, Fax, Email, and URL.

Here is a search that shows all records using the default search format:

%SEARCH{
 "form.name = 'ContactForm'"
 type="query"
 excludetopic="*Template"
 nonoise="on"
}%

Now let's add a search format to generate the CSV output. We add a header line containing the names of the fields (the format parameter should be on one line) :

%SEARCH{
 "form.name = 'ContactForm'"
 type="query"
 excludetopic="*Template"
 nonoise="on"
 header="Salutation,Name,Job Title,Company,Phone,Mobile,Fax,Email,URL"
 format="$quot$formfield(Salutation)$quot,$quot$formfield(Name)$quot,$quot$formfield(Job Title)$quot,$quot$formfield(Company)$quot,
$quot$formfield(Phone)$quot,$quot$formfield(Mobile)$quot,$quot$formfield(Fax)$quot,$quot$formfield(Email)$quot,$quot$formfield(URL)$quot"
}%

export-csv-600.png This search already produces the proper CSV output with fields enclosed in double-quotes. There are some limitations though:

  1. Double quotes in fields are not escaped - they should be escaped.
  2. WikiWords in form fields are automagically expanded into links - they should be escaped.
  3. Even if a page only contains the SEARCH, the TWiki skin adds headers with pulldown menus and footers - all output except the SEARCH should be suppressed.
  4. Content is delivered as HTML, e.g. shown in the browser - the user expects to see a "save file" dialog box.

1. Double quotes in fields are not escaped: TWiki currently has no easy way to escape a double-quote into two double-quotes, so this is a limitation we have to live with.

2. WikiWords in form fields are expanded into links: We can prevent that by enclosing the SEARCH in <noautolink> tags:

<noautolink>
%SEARCH{
  ...
}%
</noautolink>

3. Even if a page only contains just the SEARCH, the TWiki skin adds headers with pulldown menus and footers: We can suppress all output except the SEARCH by specifying the text skin and a content type of plain text. Assuming we have a ContactsAsCSV page in the Sandbox web that contains just the <noautolink> tag and the SEARCH, we get the plain text result by adding these URL parameters to the page URL:

http://example.com/do/view/Sandbox/ContactsAsCSV?skin=text;contenttype=text/plain

4. Content is delivered as HTML, e.g. shown in the browser: The user expects to see a "save file" dialog box. We change the content type from plain text to text/csv, and set the content-disposition to attachment; filename=Contacts.csv as defined in RFC 1806 to suggest the file name Contacts.csv in the save-as dialog box. Here is a link that points to the ContactsAsCSV page with proper parameters (the link should be all on one line) :

<a href="http://example.com/do/view/Sandbox/ContactsAsCSV?skin=text;
contenttype=text/csv%0Acontent-disposition%3A+attachment%3B+filename%3DContacts.csv" target="_top">Export to CSV file</a>

That's basically it. We can further improve this, such as generating a suggested file name with current date. You are invited to download and install the ContactDbAddOn, it has a working export to CSV feature.

References:

Comments

.

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r2 - 2011-11-14 - PeterThoeny
 

Twitter Delicious Facebook Digg Google Bookmarks E-mail LinkedIn Reddit StumbleUpon    
  • Help
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.