This topic is related to
PackageTWikiTopic, expect that this is a design discussion whereas
PackageTWikiTopic should be more about implementation specifics (API, etc).
Following from some recent email discussions related to dealing with file attachments we are some ideas that bring together Webs, Topics,
SubTopics and
FileAttachments. Plus Plugins and the Render pipeline.
This is my summary of a discussion with
PeterThoeny,
JohnTalintyre,
FrancoBagnoli and
AndreaSterbini. Most of this requires the large changes occuring in
TWikiModularizationCvsBranch in order to met some of the ideas in
HowShouldTWikiBeModularized,
ModularizationAndCommonTwiky and
PackageTWikiStore.
Topics related to this include:
PageCaching,
AttachmentsUnderRevisionControl,
TWikiPlugin
Note: This is a large initial content dump, some of the content will probably be farmed out to other topics.
From a practical viewpoint, I'm inclined to agree, especially about
the sub-webs. However, from a conceptual viewpoint I feel that a topic is
itself a sub-web when it has attachments. Additionally I feel that we
should move most stuff out of the pub document route.
So in summary, I'm happy to go with the pub are for now and re-think for the
future.
--
JohnTalintyre 27 Mar 2001
I'm inclined to this view. I also wonder if i) its not possible to just
dynamically generate file attachment information on view, ii) create an
index file if nessarily to speed the generation bit up.
In fact you could base a
dynamic file attachment table mechanism on a
generic subweb information provision mechanism, with an implicit call to the
file attachment sub topic if it exists.
ie psuedo code:
if exists TWiki::Topic->subweb("attachment subweb name") then
foreach TWiki::Topic->webinterator("attachment subweb name") #returns
topic handle.
render_file_attachment_row
# end foreach
# each if
Note: I'm floating some ideas around my head with regards to (TWiki::)Web
and (TWiki::)Topic. Where say a Web is just the primary node of a Topic
tree.
--
NicholasLee - 27 Mar 2001
One thing I'm not sure about would be the relationship of attachment to
topics.
Are they completely different or essentially the same thing?
Also would about meta information for a topic.
If a web is simply a topic at the top level:
at present we have reserved topic names for meta information e.g.
preferences.
top level meta information in a specific WEB (TWiki) - a bit messy.
What about meta-information for attachments:
currently fixed set in table
security information - nowhere for this at present.
--
JohnTalintyre 27 Mar 2001
In some sense I think attachments are Topics in their own right. Since
they contain a certain seperate information content. Of course its a
thin line between say an image file that is part of a Topics 'text' and
say a word or patch file which is 'additional' information with a sub
context.
My thoughts on a meta-information mechanism are still envolving.
Consider the few words I posted in
PackageTWikiStore.
I think that, its not to hard to say add a generic mechanism like such:
$topic_handle->meta("key", "text")
or
$topic_handle->meta("key", ["text", "text"])
Of course this doesn't provide very well for author/version array 1-to-1
and one context key situation.
Althought we could probably fix that by providing:
$topic_handle->meta("key", {"text" => "text"})
as well.
Now the TWiki::Store sub system would be required to provide a
mechanism to the higher level which provides generic storage of this
meta information, whether it be into a
RCS file or DBI.
{Note: with a
RCS file, I'm thinking its easy enough just to store it in
the desc and version comment fields.}
Of course for sytnax sugar reasons, basic TWiki meta-information
requirements would have their own interface. Which might or might not
sit on top of the ->meta mechanism.
ie. ->version($version_key) and ->versions() might access the stored
data directly. Particular in the author/version
RCS case, the
information would be stored in different fields in the data file.
Attacments are an important part of twiki, in fact they I think tie in
nicely with a
SubTopic system. So the question is, what meta-information
is core to their rendering effective in twiki, and do we need and
addition mechanism on top of something like meta(..) to deal with this?
From an implementation point of view, way down the road. It might be
nice in a fully objectized code base to have a generic topic/subtopic
handle that 'knew' if it was a standard topic/subtopic or file
attachement topic. Or quite possibly something else if we add that
functionality.
--
NicholasLee - 28 Mar 2001
Interesting point of objectizing webs / topics /
attachments and looking at all content as a tree.
We certaily can make things cleaner that way.
Open question are
- performance
- how to handle meta data per web (especially category table)
- where does search look in regards to current tree nod.
--
PeterThoeny - 28 Mar 2001
Performance
In the current
CGI enviroment, probably 10-20% on invokation. In
mod_perl, probably we'll see that disappear as the code is cached.
Noted, you couldn't really write the current twiki style code in a
mod_perl enviroment very well. Debugging would be hell. I'd think
that it would basically turn out objectized.
That's without regardin the run time optimisation tricks you can play in
a persistent code enviroment, caching indexes in shared memomry, etc.
I'd think also that in most cases the object
CGI version whould be fine
and that if someone needs speed for their 1000 users then they'll have
time to fine tune a mod_perl installation.
Web meta data
In the current
CGI enviroment, probably 10-20% on invokation. In
mod_perl, probably we'll see that disappear as the code is cached.
Noted, you couldn't really write the current twiki style code in a
mod_perl enviroment very well. Debugging would be hell. I'd think
that it would basically turn out objectized.
That's without regardin the run time optimisation tricks you can play in
a persistent code enviroment, caching indexes in shared memomry, etc.
I'd think also that in most cases the object
CGI version whould be fine
and that if someone needs speed for their 1000 users then they'll have
time to fine tune a mod_perl installation.
Search tree
Since its a tree I guess we should say any search function searchs up
the branch to the end node. I guess in the long run it depends on the
indexing tech we use.
--
NicholasLee - 30 Mar 2001
With Franco (Bagnoli) we are making a Latex plugin using Latex2html. It's is rather heavy, because it must produce a lot of gifs for formulas not displayable through normal html.
So, Franco has made a first attempt to a cache mechanism ... for a version of twiki that's at least 1 year old. His initial comments on the topic were at
PageCaching
We want to get in sync with the new twiki release.
Is there any plan to have a general caching mechanism in the new
experimental Store branch?
--
AndreaSterbini - 30 Mar 2001
This led to these comments: depends on what you mean by a page cache.
Is it a version of the rendered page stored in memory? The the storage system doesn't really care.
There is a small issue since the plugin system is unrelated to the storage system. (Well trying to be.) But the latex rendering occurs in the plugin.
ie. In order to cache there as to be a way for the latex plugin to delagate late plugin requests to the page cache.
--
NicholasLee - 30 Mar 2001
I developed a hackerized version of twiki in order to allow people with
scientific interests to share ideas (both in teaching and in research).
One of the most important consequences of this approach is the need of
writing mathematics in twiki. And the only way to do it is with latex,
which is the standard tool in mathematics and physics (but see later).
After many experiments, I decided that
latex2html is the most affordable tool: it can handle almost any latex
construct, since as the last resource, it calls latex and then ghostscript
to have mathematics translated into gif or png images. The drawaback of
this approach is that it it quite slow, expecially in the first
run. Indeed, latex2html is smart: it holds an hash of mathematical
expressions that cannot handle directly, associated with the resulting gif
image. Only when the mathematics changes is the resulting image
regenerated. And if the same mathematics is called more than one time in
the document, it is converted only once.
Clearly, I need a cache directory for each page where the intermediate
files are stored. what I do is the following: I use a %dependences hash,
initialized with wiki. In the readFile subroutine, the newly read file is
added to %dependences (I use an hash so to chech for multiple inclusions
of the same file), and the %dependences hash is then saved in the cache
directory. When view is called, it first checks if the modification time
of the called page is newer than any dependence, and if in case the page
is regenerated (similar to a makefile, which I also used in an early
version). Since the page actually depends on the user's preferences (I'm
planning to build a real international version of twiki), I only cache the
"body" part. The page is always rebuild if it contains informations about
date, username, etc, which changes with the user.
After having experimented a little, I found that the cache mechanism is
quite useful:
- I do not need to send the "preview" body as an hidden field inside the page: I store it in a newly created cache directory for the same page, and with "save" I simply rename the new cache as the old one.
- This allow me to "recover" a lost editing (this is not implemented at present, but it will do in a few days)
- the %dependences hash gives ths connection structure of the pages in the wiki
- using the same mechanism I can cache dynamically generated pages: for instance, I use a SLURP pluging so to allow people to attach a latex file (eventually with images) and have it decompressed, compiled, converted to html and stored in the cache directory. Similarly, I can offer people to generate an html page with a lot of gifs (say, from a word document with
formulas, if word offers this option), zip it and attach it to a page, and have it displayed with the same SLURP. Or I can offer automatic conversion from word to html, if the accuracy of converters is sufficient. I'm also using some tool to generate images (GNUPLOT) and animated mathemathical images (again with gnuplot + Imagemagick) and they too are cached.
- I'm also considering the option to cache the body of the page as a standalone html document. With a framed interface, I can offer a clear division between control element and contents. Moreover, I can simply 'wget' the cache tree to have a standalone "image" of the wiki, without
any load to the server.
- Another possibility is to "transclude" esternal html pages (again with the SLURP syntax) and cache them, so that the whole thing can be stored on a cdrom
- I would like to offer people the opportunity of having the pages automatically translated by babelfish. I would prefer to translate the source, instead af the html, since in this case the user can actively edit the translation and contribute to the wiki.
--
FrancoBagnoli - 30 Mar 2001
Since Store is meant to provide a generic storage mechanism, something like a latex file obviously has be able to be stored as well. This is probably via a
FileAttachement.
Or if we follow the
FileAttachmentIsASubTopic path, a context-provide
SubTopic. Ok, so we access a Topic that contains latex code that needs to
be rendered. The context-aware rendered pipeline knows (its been told I
guess

to pass this to the render part of the
LatexPlugin. This does its
job, creates some
HTML and graphics. What we need then is the Cache
mechanism to be able to take partially rendered pages and attachments and
store them, then using the meta-information system add a "I'm in the Cache"
by-line to the Topic in question.
So the next time the Render pipe passes though that Topic, instead of the
"pass be to the latex plugin rendered" its "pass me to the latex plugin
cached handler."
The storage system doesn't care, and it's just left to some carefully worked
interaction between plugins and the render pipeline.
[...]
I want to get away for the low level storage sub-system being created with
dependancies like this. Makes things like abstracting clusters and DBIs
very tricky. I'd prefer that everything except TWiki::Storage not depend on
the being a filesystem. (Of course that depends on TWiki::Cache and
TWiki::Index's design, but you get my point.)
--
NicholasLee - 30 Mar 2001
Just stuck me that the
TWikiPreferences variable: %WIKIWEBLIST% (
Main | TWiki06x01 | Sandbox) already encodes the idea that Webs are Sub[Topics/Webs] of the root Web/Topic.
--
NicholasLee - 1 Apr 2001
I agree that the system should be independent by the storage medium, but most "external" tools (like latex2html for the math extension) relies on an actual filesystem. So I propose that the Store library should offer the possibility of accessing the data as a filesystem,
eventually creating a directory "on the fly" or reading from it.
I mean: if I need to process a topic using an existing tool (say, gnuplot for creating images on the fly), I could ask TWiki::Store to give me the name of the "preview" directory. If the storage mechanism is not a plain filesystem, this is created "on the fly" (maybe on a ram disk). At the end of the elaborations control is given back to the Store library, which can for instance compress everything and put it on a database, or maybe send them to another server.
For what concerns the cache system, I think that it should work in this way (see also
MathModePluginDev):
- All "processing" elements (rendering routines, plugins, etc) should be divided into "static" or "dynamic" ones, by using the appropriate hook. For instance, the rendering of a "bold" portion is statis, as it is the smilies plugin. On the other hand, the calendar plugin or the processing of a %DATE% tag is dynamic.
- In the preview phase, a preview directory is created, using a unique name (it may reside in the pub dir). The contents of the existing preview dir are copied into this new directory (this can be used by plugins like latex2htm which have their own cache system, or if one is using a Makefile for updating something). The raw page contents and the page processed by the static plugins are also stored in this directory. Then the page is processed by the dynamic plugins and is returned to the user. The name of the preview directory is returned in an hidden field (this replaces the field that at present contains the page contents).
- If the user saves the page, the old preview directory is deleted and the cached raw page contents are used to replace the page in the data directory. The name of the "preview" directory is stored in a META tag. Alternatively, if the "official" preview directory has a fixed name, the new preview directory is renominated.
- If the user cancels editing, the new preview directory is deleted.
- There is the possibility of recovering a crashed editing session, by examining the "preview" directories in the pub dir. They can be eliminated by a cron job.
- The view phase corresponds to the "dynamic" processing pf the partially processed cache.
- At save time, one could also extract tags and variables from the page, and store them in a quickly accessible structure (say, a Storage dump of an hash structure). In this way one can access page permissions and page variables in a quick way. Moreover, it is possible to implement an access control mechanism independent of twiki: whenever a web/topic or web/pub is accessed, a handler can load the dump of variables and control the access permissions, without knowing the TWiki syntax. I think that metadata and other variables can be definitely accessed separately from text, in a way similar to Zope properties, and used or accessed from the page. This would make simpler editing data. I try to explain better: imagine that one has the opportunity to edit pages's metadata or variables or access rights using the "More" button. In this way one gets a form with suitable promps and fields. Then one can use these data inside the page (or the template).
- This approach can be useful also for having an ftp server in parallel with the http server. If the ftp server is "TWiki aware" about permissions, lock, etc. one can exploit the capability of many editors of accessing files via the ftp protocol, or write a custom wrapper for downloading and uploading the file. This would allow the use of the preferred editor (similar to what Zope does).
- Finally, the distinction between a page and its pub directory blurs: one could store everything into the "pub" directory. In this way a topic becomes a directory, and the implementation of multi-level webs (if needed) should be much more natural.
--
FrancoBagnoli - 06 Apr 2002
Everything New Is Old
I've had this set of ideas trundling 'round my imsomniac brain for a few nights, but never wrote it down publicly because I thought it was just me who thought this would simplify things. (And I don't have the skill to implement it anyway). However a trip to
TWikiIRC confirms that I'm not silly -- and pointed me to this topic which discussed largely the same thing a few years ago.
- user entered text, '%TEXT%', is a single text file, like now
- metadata is in a separate file, or perhaps several files
-
pub and =data are collapsed into a single entity
- the whole directory is versioned, so anything which is placed in the folder is automatically saved
- the versions are in a different location, making for easy backups (e.g.
/wiki/RCS/someweb/TopicsWereFolders,v )
/wiki/someweb/TopicsWereFolders:
TopicsWereFolders.txt # %TEXT%
attach0.jpg # an attachment
attach1.pdf # another one
access.meta # access restrictions/priviliges
TopicClassification.meta # FeatureBrainstorming, NextGeneration,...
NotifyList.meta # who's to be notified on topic changes
form3.meta
form4.meta
etc.
INCLUDE would reference the folder name, and what to be included passed as a parameter. This allows straightforward access to the attaches, metadata, etc. If no parameter is supplied, '%TEXT%' is assumed.
WebChanges is parameterized similarily so that each aspect can be tracked separately. e.g. show me what topics have changed, but ignore form data edits. Or, what attachments have changed? Or, what access restrictions have been edited? ...
There is a lot of good follow up in the irc log:
http://koala.ilog.fr/twikiirc/bin/irclogger_log/twiki?date=2004-06-01,Tue&sel=28#l24
http://koala.ilog.fr/twikiirc/bin/irclogger_log/owiki?date=2004-06-01,Tue&sel=18#l14
--
MattWilkie - 01 Jun 2004
(following copied here from owiki irc
and lightly edited by Matt)
Nope, it's not a silly idea. I'm going down that route with the python code
It's also been discussed before on twiki.org, IIRC something like
TWiki:Codev.AreWebsTopics
?
Part of my rationale here is named sections are in fact topics
Which means if
A) named sections are topics, and
B) a topic can contain sections, and
C) things that contain topics are webs then
D) Topics
are webs/folders
Furthermore, currently there is a schism between data & pub - as you pointed out. If you stored the text in the pub directory, then you instantly gain alot - in the manner you describe
Furthermore, a topic view then also just becomes a list of things, in order, to get pulled into a document
Secondary example:
Most wikis have
SomeTopic
Fewer wikis have
SomeTopic
with named sections you can have either
SomeTopic.SomeSection or
SomeTopic
If you choose
SomeTopic.SomeSection, this is VERY similar to TWiki-of-3-years-ago's nested webs syntax of
SomeSubweb.SomeTopic
So you at that point go - topics
are webs
Consider also an alternative syntax for versions:
Namespace.SomeTopic/ver/1
Namespace.SomeTopic/ver/2
Namespace.SomeTopic/ver/privateMS/2
Namespace.SomeTopic/ver/privateMS/3
Namespace.SomeTopic/ver/CHAOS/2
etc
Currently the single name
SomeTopic contains multiple pieces of text already
Even if you don't consider named sections!
Consider a further alternative -
email to wiki gateway:
If your email subject is "FooBar", then your email is either appended to the topic, or merged inline with the topic (say you include sufficient quoted context)
If you do that, then your view of the topic is still just one page
But if you look at the email thread you might have 10 different emails
All of which are contained in that single topic
So from
that perspective, again, a topic is a container
It [NamedIncludeSections] was alot of extra syntax, but syntax is a crutch in wikis - they work better when sections are implicit IMO.
yes! implicit is good
In terms of implicit, people expect really that headings should be taken notice of
After all, why put a heading in if the thing coming up isn't important in it's own right?
It also implies that each definition in a definition list should be addressable by name
For example a Glossary page with lots of small definitions would be really nice to have auto linking to
rather than being
forced down the route of lots of small pages with a search
The single page is initially at least more natural
The real gain with named include sections over the past year is I've been able to play with and test these ideas simply, without having to change the wiki dramatically to discover they're well worthwhile
To use them to fruition though requires architectural changes to make it efficient
For example - TWiki variables - these are nothing more really than named sections of their own
Again a different syntax, but nonetheless the same essential properties
The real benefits come from when you stop thinking of topics as topics, attachments as attachments, and webs as webs
And rather consider them as they really are - just chunks of text with different organisation boundaries
For example, what is a bookview of a web? Is that a topic? Is it a web?
The most basic answer I've come up with so far is that they're all "just" text, with a link.
But that then also describes email, news and other systems
And you can cut it, slice it, dice it anyway you like, but it all boils down to the simple idea is that text contains text
And we create arbitrary dividers for that text
They're normally well signalled, so there's no reason we can't get a machine to do the organising for us
For example, people have said "why can't the wiki tell you if you are writing something similar to stuff already written"
Where stuff was considered to be a topic
But indvidual paragraphs, bullet points, and definitions are the same
(I've been giving serious thought as to why wikis work - especially ones like TWiki which are significantly more user hostile than others like usemod or moin)
also on Owiki:Openwiki.LowLevelStorage .
--
MS - 01 Jun 2004,
-- MattWilkie - 09 Jun 2004