Tags:
archive_me1Add my vote for this tag caching1Add my vote for this tag create new tag
, view all tags

Feature Proposal: Control of proxy and browser caching through HTTP headers

Motivation

In search engines, it is very useful to be able to search for topics modified in a given period, and for heavy traffic site to make browser re-use the page rather than re-asking for it each view

Description

The patch make TWiki emits a Last-Modified-Date (LMD for short) being the metadata date of the topic, and (optionally) an expiration date set N seconds (default 60) in the future, via a new variable in TWiki.cfg, $expirationLastDateOffset

Dynamic TWiki constructs (like %SEARCH) can force the Last-Modified date to be NOW

-- ColasNahaboo - 19 Aug 2005

Impact and Available Solutions

Note: Patch is attached as http://www.twiki.org/p/pub/Codev/CacheControlHeaders/LastModifiedDate-HttpHeader.diff. The patch is against the TWiki Cairo release.

Documentation

(patches may have some offset lines)

  • Apply patches in order:
  • in your bin/ directory, do a:
    sed -e "s/force = ''/force = 1/" <view >viewf;chmod a+x viewf
    ln -s viewf viewauthf
  • give in bin/.htaccess the same protections for viewauthf than viewauth, i.e. write:
    <Files "viewauthf">
    = require valid-user=
    </Files>
  • if you have special protections for you view script in bin/.htaccess (normal installations do not), give viewf the same rights/protections as view
  • (optional) try to make editf as non-cacheable as possible, in bin/.htaccess
    <Files "editf">
    = ExpiresDefault "access"
    = Header set Cache-control max-age=0,no-cache,no-store,must-revalidate=
    </Files>
    (first line needs apache module expires_module, second headers_module)

Be sure to clear any value of HTTP_EQUIV_ON_VIEW in your global TWiki.TWikiPreferences, otherwise the patch will have no effect on expiration dates

   * Set HTTP_EQUIV_ON_VIEW =

You can set now the $expirationLastDateOffset in lib/TWiki.cfg to the value N you want. On heavyly-edited sites by multiple authors with no performance problems, set it to 0. On internet sites with performance problems, with pages edited by people aware that they may have to hit reload to see the new versions, you can set it to 3600 (one hour). 60 (default) to 600 should be good medium value. What this means is that users going to a page they visited N seconds ago will see the old version without the servers being asked for generating a new one, except for dynamic pages with %SEARCH in them that will always be re-computed.

Also pages will be now correctly dated, so that google-like search engines will be able to provide a more accurate search.

Examples

Implementation

The TWiki::writeHeader & TWiki::writeHeaderFull gain a new optional parameter, $lastModifiedDate for giving the date in unix time. The View script uses the one found in the parsed meta value to pass it to the header generation code

Dynamic constructs should be modified to include the lines:

    # set last-modified date in http headers to now
    $TWiki::UI::View::lastModifiedDate = '';
The v1.1 of this patch have done it for %SEARCH, but it should be done for other constructs too.

TODO: put the above lines in all the %-constructs generating dynamic contents. Note that you can also put a date (number of seconds since 01 Jan 1970) in it if you can precisely know the "freshness" date of the generated contents

%INCLUDE has been modified in the 1.2 version of the patch to set $TWiki::UI::View::lastModifiedDate to the date of the most recent of the included files and the including one

The problem now is that when one edit a topic, on save the topic will appear not to have changed: this is because the broser thinks that the topic has not changed based on its previous LMD, and do not refetch it. The solution I have found is to make all TWiki-generated redirects to view of topics redirect to a new script viewf (f for force) that will emit the same topic as view, but with a LMD of now, and an expiration date the same. This is set by a new request-global variable $viewForce that the view script sets to '', and the viewf script sets to 1.

Also, for view-protected pages, we need to take into account the view/viewauth antics. The simplest way I have found was to:

  • create a viewauthf script which is to viewf what viewauth is to view
  • exclude the view=>viewf conversion in TWiki redirects if we see /viewauth in the url

But now that means that the trick of adding a server view time to the edit url do not work anymore, as the view page could be reused a lot of time, making user edit a previously edited old version fetched from the browser cache. we need to mae the dit url in skins or the one generated by the engine call editf, a non-cacheable page, that will in turn redirect to the edit page with a time parameter computed at the time of the click on edit, not the view of the page. This redirected edit will on the opposite be very cacheable to avoid loosing edits under IE when going back/forward in the browser.


Discussion:

Tags to be modified to change the LMD: (non-exhaustive)

  • %TOPICLIST%
  • %WEBLIST%
  • %DATE%
  • %GMTIME%
  • %SERVERTIME%
  • %DISPLAYTIME%

-- ColasNahaboo - 22 Aug 2005

Note: if you took the patch on Aug 22, please apply the 3rd one, and create the viewautf script, and add its entry in your bin/.htaccess

-- ColasNahaboo - 23 Aug 2005

If you want to force in all cases the expiration date to be immediate, I recommend also putting in the VIEW template for your skin the html meta tags in the head:

  <meta http-equiv="expires" content="-1">
  <meta http-equiv="pragma" content="no-cache">
  <meta http-equiv="cache-control" content="no-cache">
Firefox especially seems to not understand the expiration date whe a last-modified-date is present.

-- ColasNahaboo - 31 Aug 2005

I was forced to abandon the bin/viewf solution as it could only work if browsers were always obeying the expire date, which is not the case frown Instead I resorted to redirecting to an URL with an added ?t=number added to it (or &t= if there is already parameters). The code does this only for bin/view* urls (thus also for viewauth), and do no re-add it if it is already there. Implementation:

  • You can remove everything about viewf which is not used anymore
  • apply the patch after all the other ones above: ForceViewReloadFromRedirects-4.diff: addendum to the above: no more /viewf, but ?t=xxx

-- ColasNahaboo - 05 Sep 2005

Variant: if you want to limit the number of digits of the argument "t" to 4 (gives 2 days span), replace the 2 lines in TWiki.pm:

$url .= sprintf("&t=%x" ,time());
by:
$url .= sprintf("&t=%x" ,time() % 0xffff);

-- ColasNahaboo - 06 Sep 2005

We use &t=%GMTIME{"$epoch"}% on DevelopBranch.

-- CrawfordCurrie - 06 Sep 2005

patch to not have ?t=xxx added after save when $expirationLastDateOffset is 0 (it is not needed in this case)

-- ColasNahaboo - 30 Sep 2005

Colas, the html meta tags to stop caching you gave are not the accepted way of handling this. The correct way is to add HTTP tags besides the date tag discussed here.

The HTTP three headers to add to stop caching are (CASE is significant):

Cache-Control: no-cache
Expires: Wed, 28 Dec 2005 18:53:15 GMT
Pragma: no-cache

Obviously, the Expires date should be set properly to before right now. Also, busting cache has implications for the "back" button in the browser.

Managing these HTTP headers are only universal way to control browser and proxy caching. Most proxy servers will ignore the html meta tags.

BTW, if Twiki sent an HTTP "Last-Modified" header in its response, subsequent browser requests will include an HTTP "If-Modified-Since" request header which Twiki could use to increase performance by sending a "304 Not Modified" where appropriate without a response body.

-- TomKagan - 28 Dec 2005

Just saw this page, which I had been ignoring due to the uninformative title (HTTP dates are a tiny detail of CacheControlHeaders, which is how I'm renaming this page...)

I researched this a lot a few years back, and implemented two key cache-related bug fixes relating to page editing. Some of the existing TWiki cache coding is required to work around issues with InternetExplorer and OperaBrowser, and the one thing I know is that caching is very hard to get right, and very dependent on bugs in proxy caches and (particularly) browsers. The following pages have some useful information and discussion:

  • BrowserAndProxyCacheControl - overview and links to research
  • BackFromPreviewLosesText - major issue with IE 5 and 6, in which the Back key causes you to lose text. May be less of an issue now that Preview is not mandatory, but I believe that in a very few cases BackFromPreviewStillLosesText when doing Preview and Back. [I think Colas' code addresses this]
  • RefreshEditPage - fix for Opera's aggressive caching behaviour, which caused 2nd edit of specific page in a session to fail to retrieve latest page state. [Colas' suggestion of moving the edit URL suffix generation to the client is a good one, where JavaScript is enabled, but you can't rely on that always being the case.]
  • ViewAfterSaveCachesOldPage - Colas' code may or may not fix this.
    • UPDATED - This page has a good discussion between me, Colas and AndrewMoise that outlined an approach to configurable cache control to suit different types of TWiki deployments and user bases.

I agree that HTML meta tags are not useful to stop caching, and should be avoided.

Getting caching to work better is a hard problem, and a suitably configurable solution is needed to address different scenarios such as:

  1. Small workgroup in single office that edits TWiki pages many times per hour - virtually no caching is acceptable here.
  2. Large corporation with many users across timezones and slow wide-area-network (WAN) links - proxies are deployed widely and important to get good performance, so a few hours' cache expiry is OK on many pages, but may need to vary across webs or TWiki sites. Not controlling proxy caching may result in overly stale pages being served.

Proper control of caching is greatly complicated by features such as embedded searches (FormattedSearch) and per-user skins. The cache plugin work may have some good discussion here as well.

Given that this work started in the summer, I'd be astonished if it has correctly addressed all the issues (e.g. re Tom's comments on breaking the Back button). Even if it were bug-free with respect to common browsers and proxy cache software, I don't think that it is sufficiently configurable.

We should not put this into DakarRelease unless we want to delay Dakar for long enough to get this feature fully baked and tested in a lot of different environments.

-- RichardDonkin - 30 Dec 2005

Interesting approach to improve performance. However, we should not take this into DakarRelease since it is in code freeze.

-- PeterThoeny - 30 Dec 2005

Richard, good point about the embedded searches and per-user skins. What can help in this case is the HTTP "Etag:" header, and possibly a "Vary:" header marking the Etag.

-- TomKagan - 30 Dec 2005

I suggest we change the proposed release for this feature to EdinburghRelease, to avoid delaying Dakar.

It's also worth noting that Firefox 1.5 has new back button behaviour compared with Firefox 1.0.x and presumably most other Mozilla/Gecko based browsers: it is now much more aggressive, like Opera, though it remains to be seen if it has the same RefreshEditPage behaviour.

There have been several people working on server-side caching for TWiki, and much investigation of caching issues that is relevant to caching dynamic TWiki pages:

  • CacheAddOn - used quite a lot
  • TWikiCacheAddOn - re-implementation, can do per-user caching to handle per-user skins, and was apparently used to great effect at TWiki.org while running on slow hardware.
  • CacheChooserAddOn - more controllable by users, may be suitable for technical user base

Also, the TWiki built-in plugin, VarCachePlugin, caches the results of evaluating variables (e.g. searches), but not the actual web page resulting from view. This plugin has been used to CacheWebRssFeedForSpeed and for TWikiOrgTopicCaching (of WebIndex pages, etc). VarCachePlugin caches at an intermediate stage that's not directly relevant to cache control headers but could be very helpful in figuring out how to generate correct ETags that are used in such headers.

The add-on authors have some useful experience of caching dynamic TWiki pages, and one comment identifies the huge range of potential dependencies (plugins, searches, embedded TWiki variables, etc.) and changes (e.g. renames) that can invalidate a cache entry (or set of entries). There's also a huge list of cache-related pages at CacheAddOnDev.

The dependency tracking issue (i.e. when to invalidate a cache entry, or to not cache in the first place) may well be too hard to solve completely in the short term, and is more of a server-side problem perhaps, but it could be important to use the ETag as mentioned to distinguish between pages that are truly dynamic or per-user, and those that are cacheable across users or even for a single user. The trouble is that TWiki is so dynamic in its use of TWikiVariables that it might be necessary to only create a 'static page' ETag when the TWiki code is fairly sure the page is static.

-- RichardDonkin - 02 Jan 2006

I agree that this is too early for dakar. We are running with the above modification at ILOG for some time, but we found that we could not have a satisfying behavior with caching, as it will always trap users into some bad cases (editing a previous version, saving and not seeing changes). So we enabled the above code only when queried by our search engine ( http://www.aspseek.org/ ), a nice free google clone.

I now think that we should not use any HTTP caching, which is too hard to get it right across browsers & proxies, but aim to server-side caching (generate pre-computed pages). But this definitely needs more experience.

-- ColasNahaboo - 04 Jan 2006

It's hard to get HTTP caching right, but in some companies proxy caches are mandatory for intranet performance as well as the Web, and of course virtually every Web user has caching enabled in their browser.

So I think it's worth the effort to come up with a caching approach that uses HTTP headers, if only to enable the View page expiry time to be configured (could be left as 'expire now' by default).

However, it probably makes sense to push ahead first with server-side caching as this is better at handling dependencies such as dynamic variables. It only addresses the CPU overhead of TWiki, but in most cases that will be the primary cause of a slowdown.

One simple but important point for server-side caching is to change skins to include (A) time of cached copy and (B) a Refresh button. Point (A) is also important for proxy/browser caching - a suitable TWikiVariable should be enough.

-- RichardDonkin - 04 Jan 2006

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatdiff ForceViewReloadFromRedirects-2.diff r1 manage 1.3 K 2005-08-23 - 09:45 ColasNahaboo addendum to the above
Unknown file formatdiff ForceViewReloadFromRedirects-3.diff r1 manage 0.8 K 2005-08-23 - 16:49 ColasNahaboo addendum to the above: acll editf from %EDITURL
Unknown file formatdiff ForceViewReloadFromRedirects-4.diff r1 manage 2.6 K 2005-09-05 - 16:21 ColasNahaboo addendum to the above: no more /viewf, but ?t=xxx
Unknown file formatdiff ForceViewReloadFromRedirects-5.diff r1 manage 0.8 K 2005-09-30 - 09:52 ColasNahaboo addendum to the above: no ?t=xxx when $expirationLastDateOffset is 0
Unknown file formatdiff ForceViewReloadFromRedirects.diff r1 manage 2.1 K 2005-08-22 - 17:22 ColasNahaboo force reload after a save. apply AFTER the above patch
Unknown file formatdiff LastModifiedDate-HttpHeader.diff r2 r1 manage 8.9 K 2005-08-19 - 17:31 ColasNahaboo Patch to cairo, v1.2
Compressed Zip archivetgz editf.tgz r1 manage 0.8 K 2005-08-23 - 16:23 ColasNahaboo editf script to add to bin/
Edit | Attach | Watch | Print version | History: r22 < r21 < r20 < r19 < r18 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r22 - 2007-05-05 - WillNorris
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.