Tags:
archive_me1Add my vote for this tag extract_stuff1Add my vote for this tag create new tag
, view all tags
I thought it would be helpful to give a quick status review of stage that the TWiki core code has reached with 4.2, and what is on the brink of being done. Ths is mainly for the benefit of any developers considering contributing to the core for the next release.

Here's where we stand with 4.2:

  • At long last we have a rational abstraction of "user"
  • The store abstraction is almost ready to support implementation of other back-ends (for example, databases)
  • We have the beginnings of a ContentAccessSyntax, courtesy of the query search and IF statement work
What is ready to happen next: What needs to be done:
  • I18N and internationalisation urgent
  • Close normal, low and enhancement status bugs
  • More performance analysis. IMHO we have all the performance analysis tools we need, we don't need any more. We just need people to run the existing tools, draw conclusions and act on them.
Right now we are all staying quiet and not checking anything in, pending two things:
  1. Resolution of the governance process for TWiki
  2. Reworking (upgrade to 4.2, new front page(s), simplified navigation, refactored Codev content, simplified documentation process) of the TWiki.org website.
These are the most important activities for TWiki ATM, and we don't want to risk taking focus away from them.

-- Contributors: CrawfordCurrie - 27 Jan 2008

Discussion

Thanks for this overview Crawford.

On the "What is ready to happen next" it is important that we do not just jump in starting coding. Because so few have contributed to the core the past year, and our strong wish to expand the core development team, it is important that the topic object model and the storage work is being specified in detail. And once this is done the individual pieces of work needs to be defined. Not in an old fashioned 6000 pages requirement way but in a modern light weight agile fashion. This way new contributors have a chance to join and participate and we avoid that people work in opposite directions.

I am hoping that Crawford will want to take the lead in this.

-- KennethLavrsen - 27 Jan 2008

On performance, this is a personal itch of mine, because performance tends to go down the toilet quite fast. What tools are there?

-- KoenMartens - 27 Jan 2008

On performance an out of the box TWiki is not bad. It is when you add 10000s of topics and use advanced searches and have to check access rights with 1000s of users that performance becomes an issue.

The good news is that the topic object model, the cache, and the storage model work will all address this aspect.

There is no doubt that the release theme for 5.0 is "performance". Or with a pop-word I would say it is about "future".

The way I sense the community spirit, there will be no big discussions on "what". It will be "how". Which is good because then the discussion becomes technical and that is where we are all the strongest.

-- KennethLavrsen - 27 Jan 2008

@Kenneth, I don't want you to confuse "treating a topic as an object" with "Topic Object Model". The former is a refactoring of the core code to eliminate some unnecessary parameters, reinforce the store abstraction, and make the code easier to read. This work was planned for 4.2, but I decided it wasn't worth the risk back in June 2007, before I knew 4.2 was going to take so long to release. It does not affect users, plugin authors, extension authors, documenters, or anyone else who does not work directly in the core code (and it won't even scare core coders). It's done, ready to check in, all the tests pass etc, etc.

A "Topic Object Model" on the other hand is an external interface / view of TWiki data that does impinge on users, plugin authors etc. and is the next step on the ContentAccessSyntax road. This does require careful public specification. I carefully avoided saying using this term for precisely this reason. It's the most exciting and sexy development proposed for the core, but it's also quite hard.

The store abstraction has reached a point where a store is effectively an extension, so I'm not quite sure why you would want a public design of the store backends. Knowing that someone else is working on an implementation of the same store as you is clearly helpful, of course.

@Koen, the main performance analysis tool is the Monitor class, which allows you to instrument the core code in a relatively non-intrusive way for performance analysis with CPAN:Benchmark. When used alongside CPAN:SmallProf this is the best way I have found to analyse the performance, though Sven is working on some promising lower level tools as well. Performance analysis based on time or ab or even DProf is so inaccurate and subject to Perl effects that it's no better than opinion, and IMHO not worth wasting time with, except for very crude comparisons.

-- CrawfordCurrie - 27 Jan 2008

When I talk about the Topic Object Model I mean Topic Object Model. The internal code refactoring is to me a code refactoring that a few developers can go ahead and do without even asking for permission as a feature proposal.

When I talk about storage model then I do not care for alternative ways to just store a topic. There is no point that topics can be stored in databases if we still have to do twiki applications that have to do formatted searches the brute force way. Being able to store a topic in a database instead of a text file brings very little value on its own.

We need to start designing a long term way solution for how topics can be stored so the content can be indexed and accessed quickly.

I would really like to see some design that enables

  • Full compatibility with any existing twiki application by still supporting the old slow but powerful regex searches. This is best done by not giving up the flat text file. I know I am not alone in this point of view.
  • Topic being saved in parallel in a topic object model where access rights, and form fields as the most obvious thing is in their own database tables. This will make query searching 100 times faster than regex searches. Access control code will be much faster as it no longer has to plow through meta data as well as topic content.
  • The topic content is split into elements that can be effectively indexed and searched using our new query search. Elements that can be treated by core as well as plugins as objects to enable development of very powerful new applications.
  • The database can be rebuilt from the flatfiles on demand. You should be able to delete the database files and then press the rebuild button and some minutes later the whole thing is built again.

The advantage of a flat-file in parallel with database store/TOM is that I can still move topics around moving files. I can still copy an entire web by copying all the files to another TWiki server. I can download a TWiki application by downloading some simple flatfiles in a zip file and throw them on top of my running server. And then press the "rebuild button" and I am good to go again.

I think it is a very good idea to stop coding new thing in the core for a while and get the ideas we have been discussing for years designed.

  • It is my hope that the dual storage principle (text + database) can be the agreed model. There are so many advantages by doing this and the overhead is minimal.
  • The topic object model (TOM) work is being developed.
  • Once the TOM is defined I would like to see a real database designer on the project helping defining the table design that gives the most efficient performance. Database design is a profession on its own. Anyone can design a database. Few can do it efficiently.
  • We need to choose a good default database technology. I assume using an existing open source database solution is the obvious choice. I remember that Michael had some ideas.

The solution could be done in phases. We could simply start by placing access rights and form fields in databases. When you save a topic the access rights would be parsed from meta as well as content and stored in tables. Form fields should be straight forward to store in tables. And a 3rd step could be to have tables in topics in objects that go in their own tables. Searching in tables in topic is a regex pain today and very inefficient.

-- KennethLavrsen - 27 Jan 2008

There is no point that topics can be stored in databases if we still have to do twiki applications that have to do formatted searches the brute force way - one of the nightmares (challenges) we face when working on TWiki is where "shortcuts" were taken, based on huge assumptions about how things were done at lower levels in the code. One of the biggest of these is the assumption that meta-data is embedded in topic text - excellent for grep over plain text, but an utter PITA for any other store implementation. However, we can't break existing TWiki applications, so we have to continue to support this search mode - and indeed, the store API is equipped with methods that require exactly that from a store backend. How the backend implements that search is up to the implementer of that backend, and the tradeoffs they want to make.

Sven and I have done all of the work on this to date. Our approach has been from two directions; I have been hardening the dividing line between the store abstraction and the core (the store API). We have also been investigating different store back end implementations, to try and find the "sweet spots".

This is best done by not giving up the flat text file That's an assumption on your part. I have a more open mind on the subject; there are other ways to achieve the same goal. Flat files should never be assumed outside of the store backend. A store backend may use flat files, but the store abstraction does not, and must not, dictate that.

There are many ways a store could be implemented in a database without using flat files. For example, a store implementer might choose to cache the topic text with embedded meta-data in the DB, and then use SQL to search that. Another might choose to generate that form on the fly for searching. The core should not constrain that.

Once the TOM is defined I would like to see a real database designer on the project helping defining the table design that gives the most efficient performance - I would love it if a DB expert could get involved. This is an interesting project IMHO, as TWiki has a curious mix of fixed and dynamic schemas. The best way to implement it is not at all clear.

We need to choose a good default database technology - we have one; flat files on disc. It works well for most TWikis, and it is only when you hit scaling issues that you need to look beyond it.

The solution could be done in phases - that's right, and the first phase was to decouple the store implementation from the core. We are almost there with that phase.

And a 3rd step could be to have tables... - as you probably already know, the DBCacheContrib has had an internal representation of tables in topics for some years now. Developing this abstraction, and other work on sectional editing, enabled us to see that a single fixed "view" of a topic is suboptimal. Is a topic a sequence of paragraphs with embedded tables? Or a sequence of tables separated by text? How do you represent a table that is built from a SEARCH? It's not clear yet what the best approach is; as I have described elsewhere, I favour the idea of supporting different "views" over a topic (e.g. flat text, text+tables, sectioned, purple numbers etc) but that is by no means the only possible approach. But you will find disagreements even between Sven and I, the people closest to the problem, on the best approach to this.

I'm glad some more people are finally taking an interest in this; I've been raising the topics for years with little feedback, so it's great to think we may finally be able to get some joint development going.

-- CrawfordCurrie - 28 Jan 2008

The subject is generally too far above me so I have abstained from feedback. But I do have taken interest in the subject as I see the importance for the longevity of TWiki.

-- ArthurClemens - 28 Jan 2008

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r8 - 2008-01-28 - ArthurClemens
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.