Tags:
create new tag
view all tags
Follow up from WhatAboutSearchEngines.

Modification time: No, dynamically generated pages to not show a Last Modified time. Anybody knows if there is a trick to so that?

-- PeterThoeny - 05 May 2000

Is there a Last Modified time HTTP header? If so, you should be able to put in a HTTP-META tag with a %LASTREVISION% variable that is formatted correctly for HTTP. This can be plaed in the view.tmpl template file.

-- JamalWills - 09 May 2000

This is a small enhancement so that the http header has a Last-Modified field. Search engines depend on this.

Example Last-Modified field:
Last-Modified: Tue, 25 Apr 2000 10:12:14 GMT

To do:

  • Add a http-equiv meta tag to the view.tmpl file:
    <meta http-equiv="Last-Modified" content="%LASTMODIFIED%">

  • Introduce a new variable %LASTMODIFIED% that gets the modified time of the text file in GMT. It returns the current time if the topic does not exist.

  • Introduce these new related variables: (Done once at init time for performance reasons, then use them later without further rcs calls)
    • %LASTREVMODIFIED% : Time of the last document revision.
    • %LASTREVAUTHOR% : User who saved document last.
    • %LASTREVNUMBER% : RCS revision number of latest revision.

  • The existing %REVINFO% variable needs to be retired and replaced by:
    %LASTREVNUMBER% - %LASTREVMODIFIED% by %LASTREVAUTHOR% .

-- PeterThoeny - 09 May 2000

Just realized that it is not a good idea to use the %LASTMODIFIED% time based on the RCS time stamp for the meta tag. This is because a page might get updated without increasing the revision number (for example if the user saves the same topic again within one hour). That means the meta tag needs the modified time of the text file, not the RCS time stamp. I changed above variables to reflect that.

-- PeterThoeny - 10 May 2000

Yeah, the difference between the stat(fname)[9] and the $revDate times gave me fits in the search code. :-)

-- KevinKinnell - 10 May 2000

I'm fairly new to Perl, so the following code may not be very elegant. I added this code to the view script just after the

$tmpl =~ s/%TEXT%/$text/go;
line.
    my $dataDir = &wiki::getDataDir();
    my @statArray = stat "$dataDir/$webName/$topic.txt";
    my $topicEpochDate = $statArray[9];
    my @localtimeArray = localtime($topicEpochDate);
    my $thisDay = (Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday)[$localtimeArray[6]];
    my $thisMonth = (January, February, March, April, May, June, July, August, September, October, November, December)[$localtimeArray[4]];

    my $topicDate = ($thisDay . ", " . $thisMonth . " " . $localtimeArray[3] . ", " . ($localtimeArray[5]+1900));

    $tmpl =~ s/%TOPICDATE%/$topicDate/go;
I then include the %TOPICDATE% variable in view.tmpl where I want the topic's last modified date to display. A couple notes:

  1. This code should probably be put into wikicfg.pm and called as a function.
  2. Additional variables could easily be defined to display other info returned by the stat function. Probably not much use to normal users, but maybe useful for admin purposes.

-- RussellTiller - 29 Mar 2001

Isn't there some "time to live" head element too? This is what browsers use to decide to either fetch a page from cache or from the site. (Actually, I think it usually fetches from the cache if possible and also looks at the "last modified" time on the site too, but i'm not sure.) Actually, I think it's also used by browsers to reduce cache size as needed.

-- DavidLeBlanc - 31 Mar 2001

I've patched the March 2001 beta version of the view script to correctly set the Last-Modified and Expires headers, based on RussellTiller's code with some simplifications. Here's the code, including two unchanged lines at beginning and near end:

    $tmpl =~ s/%TEXT%/$text/go;    # UNCHANGED

    # RD: 14/1/02, add the Last-Modified and Expires HTTP headers
    # Required format: Last-Modified: Thu, 23 Jul 1998 07:21:56 GMT
    # Get the modified time of the file
    my $dataDir = &TWiki::getDataDir();
    my @statArray = stat "$dataDir/$webName/$topic.txt";
    my $lastModifiedSeconds = $statArray[9];    # Last modified date/time

    # Convert to time string in GMT
    my $lastModifiedString = gmtime($lastModifiedSeconds);
    $lastModifiedString =~ s/ /, /;             # Comma after the day
    $lastModifiedString =~ s/$/ GMT/;

    # Make this available as a TWikiVariable as well
    $tmpl =~ s/%LASTMODIFIED%/$lastModifiedString/go;

    # Add the HTTP headers - note that putting this in the HTML <META> element
    # has no effect on incremental fetch using GNU wget, or on many proxy
    # caches, because they only look at the HTTP headers.
    my $expireInterval = '+1d';    # Expire after this delay (see CGI.pm for syntax)
    print $query->header(-content_type => 'text/html',
                         -last_modified => $lastModifiedString,
                         -expires => $expireInterval );
    # RD: 14/1/02, end of last-modified changes

    $tmpl =~ s|</*nop/*>||goi;   # remove &lt;nop> tags (PTh 06 Nov 2000)    # UNCHANGED
    # RD: 14/1/02, moved this header setting into $query->header call
    # print "Content-type: footext/html\n\n";

This is working pretty well, here's the output from wget, an OpenSource tool that grabs anything from a web page to a whole website, on my local twiki host (with the patch) and on TWiki.org (without the patch):

$ wget --timestamping --html-extension -v -S http://twiki/bin/view/TWiki/TWikiPreferences
--19:09:58--  http://twiki/bin/view/TWiki/TWikiPreferences
           => `TWikiPreferences'
Resolving twiki... done.
Connecting to twiki[192.168.0.12]:80... connected.
HTTP request sent, awaiting response...
 1 HTTP/1.1 200 OK
 2 Date: Mon, 14 Jan 2002 20:19:00 GMT
 3 Server: Apache/1.3.12 (Unix)  (Red Hat/Linux) PHP/3.0.18
 4 Expires: Tue, 15 Jan 2002 20:19:00 GMT
 5 Last-Modified: Tue, 08 Jan 2002 11:27:45 GMT
 6 Connection: close
 7 Content-Type: text/html

    [ <=>                                 ] 10,253         9.78M/s

19:09:59 (9.78 MB/s) - `TWikiPreferences.html' saved [10253]


administrator@UKW2KLAP08 ~/junk
$ wget --timestamping --html-extension -v -S http://twiki.org/cgi-bin/view/TWik
i/TWikiPreferences
--19:10:09--  http://twiki.org/cgi-bin/view/TWiki/TWikiPreferences
           => `TWikiPreferences'
Resolving twiki.org... done.
Connecting to twiki.org[216.136.171.204]:80... connected.
HTTP request sent, awaiting response...
 1 HTTP/1.1 200 OK
 2 Date: Mon, 14 Jan 2002 19:10:09 GMT
 3 Server: Apache/1.3.20 (Unix) PHP/4.0.6
 4 Connection: close
 5 Content-Type: text/html

    [   <=>                               ] 16,309        27.94K/s

Last-modified header missing -- time-stamps turned off.
19:10:11 (27.94 KB/s) - `TWikiPreferences.html' saved [16309]

For wget, this enables it to do incremental downloading of a TWiki site - it will only download the pages that have changed since last time, by comparing the mod time of the local copy with the latest Last-Modified time from the website. It refuses to do this if there's no Last-Modified header, as you can see.

There's still some work to be done on this, so it's not suitable for a live server just yet - this is mainly making sure that dynamic pages (e.g. WebChanges) don't have a Last-Modified based on the time of the page's source, but are always re-generated. Ideally, use of certain TWikiVariables would automatically disable the generation of the Last-Modified header, and insert an Expires 'now' header to ensure they are not cached. Anyone who knows HTTP/1.1 out there, please recommend some better headers for cache control, I'm sure these are not ideal!

One browser hint: with IE6 and probably IE4/5, when you hit Refresh or Shift/Refresh, it looks at the last-modified and expires header and uses the cache if the page has not expired (as it should do!). This was confusing after I modified the script and IE5 refused to refetch the page - the solutions is to hit Ctrl/Refresh to force a full re-GET of the page.

Also, it's worth knowing that the existing HTTP_EQUIV_ON_VIEW and similar TWikiVariables only set values in the <META> element of the HTML document, and have no effect on the headers. Hence they are ignored by most proxy caches and by wget, which is why this code is needed.

I'm using this setup to provide Windows laptop users with a ReadOnlyOfflineWiki that downloads incrementally (for the long suffering dialup users smile ) - syncing is now working nicely thanks to a Perl script. The setup requires wget 1.8 and CygWin, including CygWin's Perl, and can sync one web without syncing all the others, and post-edits the HTML to make it work with Internet Explorer.

Setting these headers should also make TWiki sites indexable by search engines, and could really enhance the cacheability of large TWiki sites such as TWiki.org, and hence improve their scalability and average response time. Perhaps more importantly it should also ensure that users see up to date information, by guaranteeing that data is cached for an appropriate time, based on the 'dynamicness' of the data (e.g. WebChanges could have an Expires time of 15 minutes perhaps, reducing load quite a bit but remaining quite dynamic, while a non-dynamic TWiki page could have an Expires time of 1 hour, perhaps). Without these headers, intervening proxy caches may well impose their own caching policies (which some people have mentioned elsewhere on TWiki.)

-- RichardDonkin - 14 Jan 2002

Re: "Setting these headers should also make TWiki sites indexable by search engines, and could really enhance the cacheability of large TWiki sites such as TWiki.org, and hence improve their scalability and average response time."

Interesting! Since I know that Google (for example) already indexes TWiki.org, I might infer that having the expire headers set properly will cause Google (or some search engines) to reindex whenever a page expires. Is that the case?

Sorry, I know it's pretty far off the point, but I'm curious.

Thanks!

-- RandyKramer - 15 Jan 2002

Not sure about what makes search engines index pages - they might just take lack of last-modified and content-length as indicators of dynamic pages, but then Google isn't bothered on TWiki.org so I suspect this is not a big issue.

I have now got Content-Length working as well, so wget should be doing incremental downloads ... but it isn't frown

-- RichardDonkin - 15 Jan 2002

One other thing to note about the use of TWikiVariables is that Use of % INCLUDE...% should cause a date comparison and set the lastChangeDate to the most recent of the

  • modified date on included file/url
  • the current last change date

-- JohnRouillard - 18 Jan 2002

Good point about %INCLUDE% - it would be good to define some way of handling most variables without having to do special-case coding, though. Perhaps we could have a 'no cache list' of variables that always generate new output, e.g. %GMTIME% - if any 'no cache' variables are present, special cache-prevention headers could be generated, but the average variable (e.g. %WIKIADMINISTRATOR%) would not turn off caching.

Having had the Last-Modified patch on a test server for a while (and also setting Expiry to 1 day and generating a correct Content-Length), I've noticed a few things that should be considered in updating TWiki to be more cache-aware:

  • IE6 and Mozilla 0.9.7 (and probably other browsers) do cache TWiki-generated pages that have last-modified, expiry and content-length (perhaps not all of these are needed) and don't go back to the server on every use of the view URL. (This is how Opera works even without these headers.) This is good in terms of reducing server load and improving response time, but not so good if the pages change frequently, of course.
  • A default expiry time of 1 hour seems reasonable, since this is also the lock hold time. The expiry time could be administrator controlled.
  • When you Save a page (from Preview) it's natural to expect to see the just-edited version - this needs to be handled as an exception (perhaps by a parameter to the view script called from the Preview form submission), otherwise you see the cached page from before editing (depending on the expiry time).
  • The RefreshEditPage patch will probably be important in any aggressive-caching setup (whether using Opera, or this last-modified setup with other browsers) to avoid the 'second edit loses changes from first edit' problem.
  • Browser reloading works in two ways:
    • 'Soft reload', e.g. IE6's Refresh and Shift/Refresh, and Mozilla's Reload, will check the browser cache and not go to the server if the page is within its expiry time. This is quite counter-intuitive and a source of confusion when I was testing a SpellChecker patch on this server...
    • 'Hard reload', e.g. IE6's Ctrl/Refresh and Mozilla's Shift/Reload, always goes to the server and reloads the page - this seems to ignore Last-Modified as it seemed to reload a page even when the page had not changed since it was cached, but the view script had been updated to change the output. Users would need to be trained to do hard reloads, which most people are unaware of.

Caching is something of a black art, but it's interesting to get some more control over it... UPDATE: This approach has now fixed BackFromPreviewLosesText and may help with SavePreviewTextOnServer - see BrowserAndProxyCacheControl for an overview.

-- RichardDonkin - 19 Jan 2002

Caching isn't a black art and good headers help - and a good Last Modified header is important!.

I noticed that Twiki has an undocumented writeHeaderHandler, and I've just added onto my Twiki a preIncludeHandler handler as well. Using these it's trivial to add in accurate Last-Modified headers. I'm taking this approach:

  • Added support for preIncludeHandler into Plugins.pm
  • In TWiki.pm in handleIncludeFile Just after $text =~ s/%STOPINCLUDE%.*//os; I've added a line: &TWiki::Plugins::preIncludeHandler($includingTopicName, $includingWebName, $theTopic, $theWeb);
  • I've then simply created a simple plugin that adds in the following two headers:
    • Last-Modified Date of Modification of Smallest Component
    • X-TWiki-Data Date of Modifcation of "Root" Component

Putting this behind a cache for reverse proxy & that behind tux makes for a very scalable webserver by the looks of things - very popular pages can be shifted from the reverse proxy into tux's serve space - a nice side effect of twiki's "non cgi" URLs...

I'm attaching the basic plugin below, and will add the small diffs necessary to the core later.

-- TWikiGuest - 03 Feb 2002

Thanks for posting this, good to see this implemented as a plugin. Some issues to consider are the use of the Expires header, and how to handle variables such as %GMTIME% that allow the user to create a dynamic page that should be made less cacheable (or even uncacheable).

In the code I posted above, I did something similar with the view script, but used the gmtime() function since TWiki seems to work in GMT - it also added a comma and suffixed GMT. Also, see BackFromPreviewLosesText for the code to add both the Expires and Cache-Control headers, which would make the caching much more controllable by the TWiki administrator, rather than using the proxy cache's heuristics. There are also some thoughts there (and on BrowserAndProxyCacheControl) of how we could make the expiry time controllable by the administrator, including the issue of variables making pages more dynamic.

JohnTalintyre is doing something in this area with BackFromPreviewLosesText, for the edit script - I think he's using a plugin as well.

-- RichardDonkin - 04 Feb 2002

Yeah, gmtime is better - given proxies/etc use UTC as their timebase... Using localtime works for me for the simple reason my localtime is GMT... It strikes me that we're all pulling at the same thing in different directions and a plugin approach is perhaps the most flexible approach. Personally I'd favour controls based on simple expiry controls like the ones you proposed - but using the TopicVarsPlugin these make the expiry granularity possible to do down to the topic level which is neat.

Maybe we should pull all these threads together on a HeaderIssues page ?

-- TWikiGuest - 04 Feb 2002

I'm in GMT land too, but only during the winter smile I think a plugin makes sense for the cache-control of viewable pages, but the fix to edit for BackFromPreviewLosesText should be in the core TWiki code, as it is essential for IE5 and IE6 to work without losing edits. Strictly speaking, that fix can set a near infinite Expires header, since the Edit state should be kept a very long time; however, the code for this fix is quite small and easily re-used for the view script, so perhaps all this cache stuff could be in the core. Proxy caches are quite common in many enterprises, so this would fit with the TWikiMission.

As pulling the threads together - how about BrowserAndProxyCacheControl, which has pointers to these pages already?

-- RichardDonkin - 04 Feb 2002

I see this didn't make it into the TWiki Core - looking at this page after over a year I can see why this would've been painful to merge, so I'm making a patch for the extra required plugin hook, and will re-check the plugin works against it.

-- TWikiGuest - 14 Jun 2003

I have coded an implementation for Cairo, see CacheControlHeaders


Category: TWikiPatches
Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatgz LastModifiedPlugin.tar.gz r1 manage 1.9 K 2002-02-04 - 03:33 UnknownUser First Stab at Last Modified Plugin
Edit | Attach | Watch | Print version | History: r20 < r19 < r18 < r17 < r16 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r20 - 2005-12-30 - RichardDonkin
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.