Good printing is an important requirement for us
and I guess for others
- think
TWikiForBookAuthoring,
DocBook.
I came up with a solution using
HTMLDOC,
a GPLed tool from
http://www.easysw.com/htmldoc/
The idea is simple:
- The
view
script pipes it's output into the htmldoc converter tool; I added the option pdf=on
- You will most likely want an extra skin, like
view.pdf.tmpl
- The htmldoc call and options come from the template
pdfcall.tmpl
. The absolute minimum ist sth. like /usr/bin/htmldoc -t pdf --book %PDFBODY%
See http://www.easysw.com/htmldoc/htmldoc.pdf for all the options available.
- The optional titlepage layout comes from
pdftitle.tmpl
. A temporary file %PDFTITLE% holds the HTML output needed for the htmldoc call.
See the quick'n dirty implementation based on TWike20011201.zip in the attachments.
(Note, that I renamed the stuff from my previous 20001201 based implementation to avoid confusion with the "print" skin.)
To get an impression, check the PDF printout of the TWiki documentation set.
TWikiDocumentation.pdf
was built using
.../twiki/view/TWiki/TWikiDocumentation?pdf=on&skin=pdf
Open Issues:
- Search paths for multiple images
- Tmp file mess
- Security?
- Implementation style?
--
PeterKlausner - 11 July 2002
The problem is a wider one that needs an architectural solution to prevent these hacks getting messy. I think we need to add a Filters mechanism the purpose of which is to convert from
WikiML into other formats. In this case, you go
WikiML ->
HTML -> PDF, but this should not always be the case.
--
MartinCleaver - 12 Jul 2002
Zope has something similar to this in its BackTalk "product" (FYI: Zope product = plugin). That environment, written in Python, uses ReportLab Toolkit software library (also written in Python) to convert structured text into PDF. Interestingly, it goes directly from the
WikiML (or a dialect version of it at least) directly into an intermediate form (
XML?) processed by the Toolkit library ultimately into PDF. The Toolkit library is open source (GPL-ish?) but I do not want to even ponder what it would take to re-implement something similar in Perl. It may be possible to leverage it as-is (at the expense of requiring Python for
PrintUsingPDF on Twiki). More information on ReportLab Toolkit can be found at
http://www.reportlab.org/rl_toolkit.html
Of course this does not solve the problem of having an intermediate file. But it does point at the idea of providing a more flexible and universal intermediate file. This architectural paradigm separates content from presentation. The "view" script would then become one "pretty printer" to render
HTML. A different "print" script (or perhaps a view?out=pdf) could be used to render PDF or any other format (e.g. LaTex, etc.) The nice thing about this is that if a user does not like the default "view?out=html" pretty printer script, a new one can be made relatively easily (e.g. view?out=myformat) without modifying the default pretty printer. I could see this being useful for allowing completely customized rendering of web content (a.k.a. skin?)....
--
PeterSanza - 31 Dec 2002
Here is another one based on
HTMLDOC:
- Install htmldoc on your twiki machine.
- Copy the following script >>pdf<< in your twiki/bin directory and adjust the three paths specified at the scripts top.
- Now you can view every topic wthin twiki by substituting the "view" with "pdf" in the URL.
I know the script looks terrible. I am not a Perl guy, so I don't know how to do better. It's just a dirty hack. But the "temp file mess" is solved and there are a couple of parameters you pass along. These are (currently):
Parameter |
Values |
Description |
format |
{ps1,ps2,ps3,pdf11,pdf12,pdf13,pdf14,html} |
htmldoc '-t' parameter. Specifies the output format. Default is 'pdf14'. |
linkstyle |
plain, underline |
Defines if links shall be rendered underlined or not. |
toclevels |
[numeric] |
Number of hierarchy levels to include in table of contents. Use toc-levels=0 to suppress table of contents generation. |
firstpage |
p1,toc,c1 |
Specifies which page shall be displayed initially (either first page including content (p1), the table of contents (toc) or the page including the first chapter's header (c1)) |
size |
letter,a4,WxH{in,cm,mm},etc |
Page size to be generated. (defaults to a4) |
bodycolor |
[html color code] |
Background color of the document. |
browserwidth |
[numeric] |
This is a very good parameter. It allows scaling all images. It is not a percentage value, though. |
skin |
[installed skin] |
You favorite pdf skin. Default to plain |
footer |
fff |
Formatting of the footer. |
header |
fff |
Formatting of the header. |
tocfooter |
fff |
Formatting of the footer within the TOC section. |
tocheader |
fff |
Formatting of the header within the TOC section. |
shiftHeaders |
[number] |
Shift all html headers. e.g. if you specify a 2 here, all your <h2> header will be <h4> when passed to htmldoc. Negative numbers are possible, too. Maximum is 6, as html does not provide a deeper hierarchy. |
skin |
[skinName] |
You can also specify a skin which will be used for generating the docuement. Defualt to plain |
titlepg |
on,off |
Determine whether title page is generated (defaults to on) |
orientation |
portrait,landscape |
Allows control of orientation, default is portrait. There are htmldoc comment directives but it is useful to be able to do this external to the content. |
fff = heading format string;
(See
htmldoc documentation.)
See the
htmldoc documentation on further information on these parameters. Add your own as well (but do not forget to submit your changes
).
Ah well, here's a little example how to pass the parameters:
http://your.twiki.host.net/path/to/twiki/bin/pdf/YourFavoriteWeb/TheDamnTopicToView?skin=print&toclevels=6&bodycolor=eeeeee
--
PatrickOhl - 14 Jan 2003
Of course you can easily 'fake' a twiki pdf skin by forwarding all requests to the other script now. Here is an example template for doing so: >>
view.pdf.tmpl<<
--
PatrickOhl - 14 Jan 2003
Updated the
pdf script. There are more parameters now. See the
table above. These still aren't all parameters provided by
htmldoc. Feel free to append some as you need.
--
PatrickOhl - 17 Jan 2003
Can I just add that this add-on is simply brilliant. Just made my day being able to easily provide pdf files of our technical docs stored in Twiki without needing to install Acrobat Distiller.
Thanks for the work
--
NathanReeves - 11 Feb 2003
I've did some documentation a while ago using something called Simple Document Format - SDF for short. It was originally created by "ianc@mincom.com" but seems to have been more or less abandoned (
http://www.mincom.com/mtr/sdf/ shows a 404). I believe it was later on adopted by the
OpenLDAP team who use it as for code documentation.
SDF uses a somewhat wiki-like base syntax and generates documents of other formats (heard that one before?). The first thing that came to my mind was whether there were any parts in it useable for TWiki-plugins etc, since the SDF tools happen to be written in Perl.
--
ConnyBrunnkvist - 11 Feb 2003
1. I get an error message when I try to view the page with /pdf/ instead of /view/ :
[Wed Feb 12 00:09:42 2003] TWiki.pm: Can't locate TWiki.pm in
@INC (
@INC contains: ../lib . . /usr/libdata/perl/5.00503/mach /usr/libdata/perl/5.00503 /usr/local/lib/perl5/site_perl/5.005/i386-freebsd /usr/local/lib/perl5/site_perl/5.005 . .) at pdf line 28. BEGIN failed--compilation aborted at pdf line 28.
I tried to mud around with the variables, but to no avail. Any ideas? I am using the Januari 2003 release.
2. I cannot download the view.pdf.tmpl file, as it creates a pdf file in doing so. Maybe a zipped version would be better.
--
ArthurClemens - 11 Feb 2003
TWikiRelease01Feb2003 has a different way of discovering the Perl libs, it depends on the
setlib.cfg
file located in the
twiki/bin
directory. With that you need to change other scripts like
twiki/bin/pdf
:
Change from: |
Change to: |
use CGI::Carp qw( fatalsToBrowser ); use CGI; use lib ( '.' ); use lib ( '../lib' ); use TWiki; use IO::File; use POSIX qw(tmpnam); |
BEGIN { unshift @INC, '.'; require 'setlib.cfg'; } use CGI::Carp qw( fatalsToBrowser ); use CGI; use TWiki; use IO::File; use POSIX qw(tmpnam); |
It would be nice to package these scripts into a
Plugins.AddOnPackage
--
PeterThoeny - 12 Feb 2003
Has anyone gotten this to work on the
TWikiRelease01Feb2003? With Peter's input I don't get errors anymore, but I only get an empty PDF (0 bytes) as result.
--
ArthurClemens - 13 Feb 2003
If the script runs well TWiki-wise, it is probably htmldoc's fault.
It is very picky about the
HTML,
especially in the block up to the first <H1 or the titlesheet.
I addressed a few of these problems in the
PdfPlugin
(which I want to post really soon now).
You should try to catch htmldoc's error output
and work your way up from a simple test page.
You might also try pagemode first.
HTH --
PeterKlausner - 14 Feb 2003
Regarding that Zope/Python product mentioned above.. a product I worked on used a similar technique. We converted our data to
XML and then ran it through an XSL processor (Apache's Xalan) using XSLT style sheets to produce a document consisting of XSL Formatting Objects. We then ran that through Apache's Fop processor to produce PDF. The process was somewhat slow, but with the advantage that XSL:FO is really designed for printing and gives you the necessary tools to place things intentionally on the page. Also, in theory the tools are replaceable with any other XSL processors, so you're not stuck with one. Basically, what TWiki would need would be code to convert its markup to
XML, and an XSLT style sheet for that
XML. The rest is already done.
--
ChristopherMasto - 14 Feb 2003
Arthur, I've managed to get the output to PDF working with the latest version of TWiki (Feb03). Didn't actually make any changes outside of what Peter suggested.
--
NathanReeves - 13 Mar 2003
Great add-on, love it!. Really pushes Twiki ahead for documentation/publishing.
Minor mods to
pdf script :
- if 'toclevels=0' use the
--no-toc
htmldoc option to suppress toc generation
- include a 'titlepg' param which if set off uses the
--no-title=
htmldoc option to suppress title page generation.
Update the options
params table to document these.
--
RobWalker - 28 Aug 2003
See
PdfPlugin for the adhoc version of a plugin;
note that above mentioned toc-level stuff is controlled via template.
--
PeterKlausner - 01 Sep 2003
Good you make it a plugin!
I did not get it to work earlier (see above), but it probably chokes on
CSS, instead of ignoring it. Is this your experience too, Peter? What exactly are the limitations for page
HTML? I understand
it manages HTML 3.2. What are do's and don'ts?
--
ArthurClemens - 01 Sep 2003
Attempted to use
PdfPlugin with
BeijingRelease, feedback recorded in
PdfPluginDev.
Added orientation parameter to the
pdf script and updated the
parameter table to reflect this.
--
TonyMartindale - 04 Sep 2003
Was having problems with the rendering of topics which had
TWikiDrawPlugin pictures in them. I hacked our version of the pdf.cgi script to include the following transformation after the topic text is read, and before the tags are processed:
# mod TwikiDraw tags into conventional attachurl tags
$text =~ s|%DRAWING{(.*?)}%|%ATTACHURL%\/$1.gif|go;
It's a pretty simple hack - mod the DRAWING tag into an ATTACHURL tag for the .gif file produced by the
TWikiDrawPlugin. I'm no Perl/Twiki expert, so I figured it better to include here as a comment for those better and wiser to consider a more robust fix.
--
RobWalker - 16 Sep 2003
Due to my naïve reliance on
TWikiSyndication, I missed that one...
Pulling in pictures works with
HtmlDoc 1.8.23,
AthensRelease,
Accessing pictures via --path is extremely tedious.
Fortunately, the newest
HtmlDoc pulls images via HTTP.
Unfortunately, this requires you to use the full URL.
Obviously, the
TWikiDrawPlugin doesn't do this
and your hack fixes this.
I will fix the
PdfPlugin to rectify such tags.
As to the do's and don'ts:
- The summary up to the first <h1> shouldn't contain any HTML at all
- The title page template is very fragile; you have to try and error
- The body seems fairly tolerant
- CSS works for, i.e. it is simply ignored. Partly I use this to make online-only stuff invisible. But for tables and other environments I'd like to retain some formatting, so I'm not for TWikiUsingCSS only. Problems might come, if you have style sheets in the body. Never used those. Just linked the sheet from the <head> and added a few class= and <span>s.
--
PeterKlausner - 17 Sep 2003
Anyone got this working with mod_perl on Win32? I'm getting htmldoc running, but it never actually spits data out to the browser. I have to stop and restart apache to get htmldoc to stop. I had it working fine with Cygwin prior to my testing with Mod_perl.
--
NathanReeves - 03 Oct 2003
I got it working on Win32 - had to install the free win32 version of HTMLDOC - otherwise it would spin forever looking for fonts, etc. I lucked out on this by not unlinking the temp html files and trying to manually run htmldoc. I also had a litlte trouble with temp file locations on XP. The one thing that it doesn't seem to be doing is walking the entire topic web - it just renders the first page. Also, I had to use the --webpage and remove the compression arguments in pdfcall.
--
GeorgePeden - 19 Oct 2003
I installed HTMLDOC and pdf.pl successfully, but now I'm having some stylistic issues regarding setting typesize for headlines and body text, as well as changing the document title shown in the header. Does anyone have suggestions?
--
ChristianSchmidt - 03 Mar 2004
The pdf script doesn't work out of the box with
TWikiRelease01Sep2004 for two reasons, first that covered by
PeterThoeny in Feb 2003 regarding that release having a different manner of discovering the Perl libs. The second reason being that TWiki::getRenderedVersion has moved into Render.pm and is now therefore TWiki::Render::getRenderedVersion. I have uploaded a patch
pdf.20040901.patch which fixes both problems. Apply it to the pdf script thus:
patch pdf pdf.20040901.patch
--
DaveKnight - 11 Nov 2004
TWiki::Render::getRenderedVersion
is an undocumented function. Please use
TWiki::Func::renderText
instead.
I have not looked into details, but isn't the functionality now covered by the
PdfPlugin?
--
PeterThoeny - 12 Nov 2004
Yes, to some extent. But IMHO,
PdfPlugin best suits to produce a book and
pdf script is better for a single page. I want both.
PdfPlugin also requires some patches to work in
TWikiRelease01Sep2004.
I've been preparing patches for both.
I doublecheck to use
TWiki::Func::renderText
before submitting.
Anyway, thanks Dave and Peter!
--
KaoruMaeda - 12 Nov 2004
Am I missing something, or is HTMLDOC no longer Open Source?
--
ChrisHogan - 11 Jan 2005
13 is not always unlucky... just weird...
http://www.easysw.com/htmldoc/faq.php?13#13
--
MartinCleaver - 11 Jan 2005
Point to note - htmldoc 1.8.24 has a new
CGI feature. on our servers this would dump bad info back and make the plugin fail. We downgraded to 1.8.23 -> this is prior to the
CGI features of htmldoc and gave us no errors.
http://www.htmldoc.org/software.php?VERSION=1.8.23
--
TerryRankine - 31 Jan 2005
For what it's worth (and not trying to step on anyone's toes), I've done a significant re-write of the pdf script idea to better integrate it with the TWiki rendering operations and preference variables. If you're interested, I published it at
GenPDFAddOn. It's very much a beta version, so I welcome comments, etc.
--
BrianSpinar - 02 Feb 2005
Setting a environment variable in the pdf script, just somewhere before the call of htmldoc, makes the script working again with
htmldoc 1.8.24:
$ENV{'HTMLDOC_NOCGI'}=1;
--
HubertWeikert - 08 Dec 2005
This script doesn't work with twiki 4.0.4, because internal functions have moved again. I've attached a
patch which makes it work again, as well as improving the rewriting of image links. I went ahead and changed almost all of the
TWiki::
function calls into
TWiki::Func::
calls in the hopes of improving the script's robustness against future upgrades.
You need to apply the
previous patch before applying my patch. Download both the patches, and then say:
patch pdf < pdf.20040901.patch
patch pdf < pdf.4.0.4.patch
... and you should be good to go.
--
AndrewMoise - 15 Aug 2006
Whoops -- someone pointed out that I accidentally uploaded the wrong file as
pdf.4.0.4.patch. I've now uploaded the actual patch.
--
AndrewMoise - 18 Oct 2006