Tags:
extract_stuff1Add my vote for this tag create new tag
, view all tags
This topic is related to PackageTWikiTopic, expect that this is a design discussion whereas PackageTWikiTopic should be more about implementation specifics (API, etc).

Following from some recent email discussions related to dealing with file attachments we are some ideas that bring together Webs, Topics, SubTopics and FileAttachments. Plus Plugins and the Render pipeline.

This is my summary of a discussion with PeterThoeny, JohnTalintyre, FrancoBagnoli and AndreaSterbini. Most of this requires the large changes occuring in TWikiModularizationCvsBranch in order to met some of the ideas in HowShouldTWikiBeModularized, ModularizationAndCommonTwiky and PackageTWikiStore.

Topics related to this include: PageCaching, AttachmentsUnderRevisionControl, TWikiPlugin

Note: This is a large initial content dump, some of the content will probably be farmed out to other topics.


From a practical viewpoint, I'm inclined to agree, especially about the sub-webs. However, from a conceptual viewpoint I feel that a topic is itself a sub-web when it has attachments. Additionally I feel that we should move most stuff out of the pub document route. So in summary, I'm happy to go with the pub are for now and re-think for the future.

-- JohnTalintyre 27 Mar 2001

I'm inclined to this view. I also wonder if i) its not possible to just dynamically generate file attachment information on view, ii) create an index file if nessarily to speed the generation bit up.

In fact you could base a dynamic file attachment table mechanism on a generic subweb information provision mechanism, with an implicit call to the file attachment sub topic if it exists.

ie psuedo code:
 if exists TWiki::Topic->subweb("attachment subweb name")  then
     foreach TWiki::Topic->webinterator("attachment subweb name")  #returns
topic handle.
          render_file_attachment_row
     # end foreach
# each if

Note: I'm floating some ideas around my head with regards to (TWiki::)Web and (TWiki::)Topic. Where say a Web is just the primary node of a Topic tree.

-- NicholasLee - 27 Mar 2001

One thing I'm not sure about would be the relationship of attachment to topics. Are they completely different or essentially the same thing? Also would about meta information for a topic. If a web is simply a topic at the top level: at present we have reserved topic names for meta information e.g. preferences. top level meta information in a specific WEB (TWiki) - a bit messy.

What about meta-information for attachments: currently fixed set in table security information - nowhere for this at present.

-- JohnTalintyre 27 Mar 2001

In some sense I think attachments are Topics in their own right. Since they contain a certain seperate information content. Of course its a thin line between say an image file that is part of a Topics 'text' and say a word or patch file which is 'additional' information with a sub context.

My thoughts on a meta-information mechanism are still envolving. Consider the few words I posted in PackageTWikiStore.

I think that, its not to hard to say add a generic mechanism like such:

$topic_handle->meta("key", "text") or $topic_handle->meta("key", ["text", "text"])

Of course this doesn't provide very well for author/version array 1-to-1 and one context key situation.

Althought we could probably fix that by providing:

$topic_handle->meta("key", {"text" => "text"})

as well.

Now the TWiki::Store sub system would be required to provide a mechanism to the higher level which provides generic storage of this meta information, whether it be into a RCS file or DBI.

{Note: with a RCS file, I'm thinking its easy enough just to store it in the desc and version comment fields.}

Of course for sytnax sugar reasons, basic TWiki meta-information requirements would have their own interface. Which might or might not sit on top of the ->meta mechanism.

ie. ->version($version_key) and ->versions() might access the stored data directly. Particular in the author/version RCS case, the information would be stored in different fields in the data file.

Attacments are an important part of twiki, in fact they I think tie in nicely with a SubTopic system. So the question is, what meta-information is core to their rendering effective in twiki, and do we need and addition mechanism on top of something like meta(..) to deal with this?

From an implementation point of view, way down the road. It might be nice in a fully objectized code base to have a generic topic/subtopic handle that 'knew' if it was a standard topic/subtopic or file attachement topic. Or quite possibly something else if we add that functionality.

-- NicholasLee - 28 Mar 2001

Interesting point of objectizing webs / topics / attachments and looking at all content as a tree. We certaily can make things cleaner that way.

Open question are

  • performance
  • how to handle meta data per web (especially category table)
  • where does search look in regards to current tree nod.

-- PeterThoeny - 28 Mar 2001

Performance

In the current CGI enviroment, probably 10-20% on invokation. In mod_perl, probably we'll see that disappear as the code is cached.

Noted, you couldn't really write the current twiki style code in a mod_perl enviroment very well. Debugging would be hell. I'd think that it would basically turn out objectized.

That's without regardin the run time optimisation tricks you can play in a persistent code enviroment, caching indexes in shared memomry, etc.

I'd think also that in most cases the object CGI version whould be fine and that if someone needs speed for their 1000 users then they'll have time to fine tune a mod_perl installation.

Web meta data

In the current CGI enviroment, probably 10-20% on invokation. In mod_perl, probably we'll see that disappear as the code is cached.

Noted, you couldn't really write the current twiki style code in a mod_perl enviroment very well. Debugging would be hell. I'd think that it would basically turn out objectized.

That's without regardin the run time optimisation tricks you can play in a persistent code enviroment, caching indexes in shared memomry, etc.

I'd think also that in most cases the object CGI version whould be fine and that if someone needs speed for their 1000 users then they'll have time to fine tune a mod_perl installation.

Search tree

Since its a tree I guess we should say any search function searchs up the branch to the end node. I guess in the long run it depends on the indexing tech we use.

-- NicholasLee - 30 Mar 2001

With Franco (Bagnoli) we are making a Latex plugin using Latex2html. It's is rather heavy, because it must produce a lot of gifs for formulas not displayable through normal html. So, Franco has made a first attempt to a cache mechanism ... for a version of twiki that's at least 1 year old. His initial comments on the topic were at PageCaching

We want to get in sync with the new twiki release. Is there any plan to have a general caching mechanism in the new experimental Store branch?

-- AndreaSterbini - 30 Mar 2001

This led to these comments: depends on what you mean by a page cache.

Is it a version of the rendered page stored in memory? The the storage system doesn't really care.

There is a small issue since the plugin system is unrelated to the storage system. (Well trying to be.) But the latex rendering occurs in the plugin.

ie. In order to cache there as to be a way for the latex plugin to delagate late plugin requests to the page cache.

-- NicholasLee - 30 Mar 2001

I developed a hackerized version of twiki in order to allow people with scientific interests to share ideas (both in teaching and in research).

One of the most important consequences of this approach is the need of writing mathematics in twiki. And the only way to do it is with latex, which is the standard tool in mathematics and physics (but see later).

After many experiments, I decided that latex2html is the most affordable tool: it can handle almost any latex construct, since as the last resource, it calls latex and then ghostscript to have mathematics translated into gif or png images. The drawaback of this approach is that it it quite slow, expecially in the first run. Indeed, latex2html is smart: it holds an hash of mathematical expressions that cannot handle directly, associated with the resulting gif image. Only when the mathematics changes is the resulting image regenerated. And if the same mathematics is called more than one time in the document, it is converted only once.

Clearly, I need a cache directory for each page where the intermediate files are stored. what I do is the following: I use a %dependences hash, initialized with wiki. In the readFile subroutine, the newly read file is added to %dependences (I use an hash so to chech for multiple inclusions of the same file), and the %dependences hash is then saved in the cache directory. When view is called, it first checks if the modification time of the called page is newer than any dependence, and if in case the page is regenerated (similar to a makefile, which I also used in an early version). Since the page actually depends on the user's preferences (I'm planning to build a real international version of twiki), I only cache the "body" part. The page is always rebuild if it contains informations about date, username, etc, which changes with the user.

After having experimented a little, I found that the cache mechanism is quite useful:

  • I do not need to send the "preview" body as an hidden field inside the page: I store it in a newly created cache directory for the same page, and with "save" I simply rename the new cache as the old one.
  • This allow me to "recover" a lost editing (this is not implemented at present, but it will do in a few days)
  • the %dependences hash gives ths connection structure of the pages in the wiki
  • using the same mechanism I can cache dynamically generated pages: for instance, I use a SLURP pluging so to allow people to attach a latex file (eventually with images) and have it decompressed, compiled, converted to html and stored in the cache directory. Similarly, I can offer people to generate an html page with a lot of gifs (say, from a word document with
formulas, if word offers this option), zip it and attach it to a page, and have it displayed with the same SLURP. Or I can offer automatic conversion from word to html, if the accuracy of converters is sufficient. I'm also using some tool to generate images (GNUPLOT) and animated mathemathical images (again with gnuplot + Imagemagick) and they too are cached.
  • I'm also considering the option to cache the body of the page as a standalone html document. With a framed interface, I can offer a clear division between control element and contents. Moreover, I can simply 'wget' the cache tree to have a standalone "image" of the wiki, without
any load to the server.
  • Another possibility is to "transclude" esternal html pages (again with the SLURP syntax) and cache them, so that the whole thing can be stored on a cdrom
  • I would like to offer people the opportunity of having the pages automatically translated by babelfish. I would prefer to translate the source, instead af the html, since in this case the user can actively edit the translation and contribute to the wiki.

-- FrancoBagnoli - 30 Mar 2001

Since Store is meant to provide a generic storage mechanism, something like a latex file obviously has be able to be stored as well. This is probably via a FileAttachement.

Or if we follow the FileAttachmentIsASubTopic path, a context-provide SubTopic. Ok, so we access a Topic that contains latex code that needs to be rendered. The context-aware rendered pipeline knows (its been told I guess wink to pass this to the render part of the LatexPlugin. This does its job, creates some HTML and graphics. What we need then is the Cache mechanism to be able to take partially rendered pages and attachments and store them, then using the meta-information system add a "I'm in the Cache" by-line to the Topic in question.

So the next time the Render pipe passes though that Topic, instead of the "pass be to the latex plugin rendered" its "pass me to the latex plugin cached handler."

The storage system doesn't care, and it's just left to some carefully worked interaction between plugins and the render pipeline.

[...]

I want to get away for the low level storage sub-system being created with dependancies like this. Makes things like abstracting clusters and DBIs very tricky. I'd prefer that everything except TWiki::Storage not depend on the being a filesystem. (Of course that depends on TWiki::Cache and TWiki::Index's design, but you get my point.)

-- NicholasLee - 30 Mar 2001

Just stuck me that the TWikiPreferences variable: %WIKIWEBLIST% (Main  |  TWiki06x00  |  Sandbox) already encodes the idea that Webs are Sub[Topics/Webs] of the root Web/Topic.

-- NicholasLee - 1 Apr 2001

I agree that the system should be independent by the storage medium, but most "external" tools (like latex2html for the math extension) relies on an actual filesystem. So I propose that the Store library should offer the possibility of accessing the data as a filesystem, eventually creating a directory "on the fly" or reading from it.

I mean: if I need to process a topic using an existing tool (say, gnuplot for creating images on the fly), I could ask TWiki::Store to give me the name of the "preview" directory. If the storage mechanism is not a plain filesystem, this is created "on the fly" (maybe on a ram disk). At the end of the elaborations control is given back to the Store library, which can for instance compress everything and put it on a database, or maybe send them to another server.

For what concerns the cache system, I think that it should work in this way (see also MathModePluginDev):

  • All "processing" elements (rendering routines, plugins, etc) should be divided into "static" or "dynamic" ones, by using the appropriate hook. For instance, the rendering of a "bold" portion is statis, as it is the smilies plugin. On the other hand, the calendar plugin or the processing of a %DATE% tag is dynamic.
  • In the preview phase, a preview directory is created, using a unique name (it may reside in the pub dir). The contents of the existing preview dir are copied into this new directory (this can be used by plugins like latex2htm which have their own cache system, or if one is using a Makefile for updating something). The raw page contents and the page processed by the static plugins are also stored in this directory. Then the page is processed by the dynamic plugins and is returned to the user. The name of the preview directory is returned in an hidden field (this replaces the field that at present contains the page contents).
  • If the user saves the page, the old preview directory is deleted and the cached raw page contents are used to replace the page in the data directory. The name of the "preview" directory is stored in a META tag. Alternatively, if the "official" preview directory has a fixed name, the new preview directory is renominated.
  • If the user cancels editing, the new preview directory is deleted.
  • There is the possibility of recovering a crashed editing session, by examining the "preview" directories in the pub dir. They can be eliminated by a cron job.
  • The view phase corresponds to the "dynamic" processing pf the partially processed cache.
  • At save time, one could also extract tags and variables from the page, and store them in a quickly accessible structure (say, a Storage dump of an hash structure). In this way one can access page permissions and page variables in a quick way. Moreover, it is possible to implement an access control mechanism independent of twiki: whenever a web/topic or web/pub is accessed, a handler can load the dump of variables and control the access permissions, without knowing the TWiki syntax. I think that metadata and other variables can be definitely accessed separately from text, in a way similar to Zope properties, and used or accessed from the page. This would make simpler editing data. I try to explain better: imagine that one has the opportunity to edit pages's metadata or variables or access rights using the "More" button. In this way one gets a form with suitable promps and fields. Then one can use these data inside the page (or the template).
  • This approach can be useful also for having an ftp server in parallel with the http server. If the ftp server is "TWiki aware" about permissions, lock, etc. one can exploit the capability of many editors of accessing files via the ftp protocol, or write a custom wrapper for downloading and uploading the file. This would allow the use of the preferred editor (similar to what Zope does).
  • Finally, the distinction between a page and its pub directory blurs: one could store everything into the "pub" directory. In this way a topic becomes a directory, and the implementation of multi-level webs (if needed) should be much more natural.

-- FrancoBagnoli - 06 Apr 2002


Everything New Is Old

I've had this set of ideas trundling 'round my imsomniac brain for a few nights, but never wrote it down publicly because I thought it was just me who thought this would simplify things. (And I don't have the skill to implement it anyway). However a trip to TWikiIRC confirms that I'm not silly -- and pointed me to this topic which discussed largely the same thing a few years ago. smile

What if TopicsAreFolders?

  • user entered text, '%TEXT%', is a single text file, like now
  • metadata is in a separate file, or perhaps several files
  • pub and =data are collapsed into a single entity
  • the whole directory is versioned, so anything which is placed in the folder is automatically saved
    • the versions are in a different location, making for easy backups (e.g. /wiki/RCS/someweb/TopicsWereFolders,v )

/wiki/someweb/TopicsWereFolders:
   TopicsWereFolders.txt         # %TEXT%
   attach0.jpg                   # an attachment
   attach1.pdf                   # another one
   access.meta                   # access restrictions/priviliges
   TopicClassification.meta      # FeatureBrainstorming, NextGeneration,...
   NotifyList.meta               # who's to be notified on topic changes
   form3.meta                        
   form4.meta
   etc.

INCLUDE would reference the folder name, and what to be included passed as a parameter. This allows straightforward access to the attaches, metadata, etc. If no parameter is supplied, '%TEXT%' is assumed.

WebChanges is parameterized similarily so that each aspect can be tracked separately. e.g. show me what topics have changed, but ignore form data edits. Or, what attachments have changed? Or, what access restrictions have been edited? ...

There is a lot of good follow up in the irc log: http://koala.ilog.fr/twikiirc/bin/irclogger_log/twiki?date=2004-06-01,Tue&sel=28#l24
http://koala.ilog.fr/twikiirc/bin/irclogger_log/owiki?date=2004-06-01,Tue&sel=18#l14

-- MattWilkie - 01 Jun 2004

(following copied here from owiki irc and lightly edited by Matt)

Nope, it's not a silly idea. I'm going down that route with the python code It's also been discussed before on twiki.org, IIRC something like TWiki:Codev.AreWebsTopics? Part of my rationale here is named sections are in fact topics Which means if A) named sections are topics, and B) a topic can contain sections, and C) things that contain topics are webs then D) Topics are webs/folders

Furthermore, currently there is a schism between data & pub - as you pointed out. If you stored the text in the pub directory, then you instantly gain alot - in the manner you describe Furthermore, a topic view then also just becomes a list of things, in order, to get pulled into a document

Secondary example:

Most wikis have SomeTopic Fewer wikis have SomeTopic with named sections you can have either SomeTopic.SomeSection or SomeTopic#SomeSection If you choose SomeTopic.SomeSection, this is VERY similar to TWiki-of-3-years-ago's nested webs syntax of SomeSubweb.SomeTopic So you at that point go - topics are webs

Consider also an alternative syntax for versions:

Namespace.SomeTopic/ver/1
Namespace.SomeTopic/ver/2
Namespace.SomeTopic/ver/privateMS/2
Namespace.SomeTopic/ver/privateMS/3
Namespace.SomeTopic/ver/CHAOS/2
etc
Currently the single name SomeTopic contains multiple pieces of text already Even if you don't consider named sections!

Consider a further alternative - email to wiki gateway: If your email subject is "FooBar", then your email is either appended to the topic, or merged inline with the topic (say you include sufficient quoted context) If you do that, then your view of the topic is still just one page But if you look at the email thread you might have 10 different emails All of which are contained in that single topic So from that perspective, again, a topic is a container

It [NamedIncludeSections] was alot of extra syntax, but syntax is a crutch in wikis - they work better when sections are implicit IMO.

yes! implicit is good

In terms of implicit, people expect really that headings should be taken notice of After all, why put a heading in if the thing coming up isn't important in it's own right? It also implies that each definition in a definition list should be addressable by name For example a Glossary page with lots of small definitions would be really nice to have auto linking to rather than being forced down the route of lots of small pages with a search The single page is initially at least more natural

The real gain with named include sections over the past year is I've been able to play with and test these ideas simply, without having to change the wiki dramatically to discover they're well worthwhile To use them to fruition though requires architectural changes to make it efficient For example - TWiki variables - these are nothing more really than named sections of their own Again a different syntax, but nonetheless the same essential properties The real benefits come from when you stop thinking of topics as topics, attachments as attachments, and webs as webs And rather consider them as they really are - just chunks of text with different organisation boundaries

For example, what is a bookview of a web? Is that a topic? Is it a web? The most basic answer I've come up with so far is that they're all "just" text, with a link. But that then also describes email, news and other systems And you can cut it, slice it, dice it anyway you like, but it all boils down to the simple idea is that text contains text And we create arbitrary dividers for that text They're normally well signalled, so there's no reason we can't get a machine to do the organising for us

For example, people have said "why can't the wiki tell you if you are writing something similar to stuff already written" Where stuff was considered to be a topic But indvidual paragraphs, bullet points, and definitions are the same (I've been giving serious thought as to why wikis work - especially ones like TWiki which are significantly more user hostile than others like usemod or moin)

also on Owiki:Openwiki.LowLevelStorage .

-- MS - 01 Jun 2004, -- MattWilkie - 09 Jun 2004

Edit | Attach | Watch | Print version | History: r12 < r11 < r10 < r9 < r8 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r12 - 2008-08-25 - TWikiJanitor
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.