Tags:
caching1Add my vote for this tag create new tag
, view all tags

Bug: View after Save caches old page

This was previously reported indirectly as the reason for SavePreviewTextOnServer, see discussion there.

Symptoms: On saving a page, the View page that results is out of date and doesn't show your most recent changes. However, forcing the page to reload, bypassing all caches, does show the correct page.

Test case

Intermittent, but happens occasionally on TWiki.org:

  • Edit a page, make a change, Preview and Save
  • (sometimes) your change will not show up on the View page
  • Hit Back and do View Source so you can save the actual HTML of the Preview page for later
  • Hit Forward (not Save) and try hitting Refresh (IE) or Reload (Netscape)
    • If the latest changes appear, you have this bug, please report details below.
  • Hit Ctrl/Refresh (IE) or Shift/Reload (Netscape/Mozilla) to bypass any proxy caches
    • If the latest changes do not appear, you may have the non-cache-related variant of this bug
  • Try using another browser to get the view page, e.g. OperaBrowser (only a 1 MB download) or MozillaBrowser.
    • If you don't have another browser, try using another PC.
    • If you can't find another PC, just start another browser window on your PC (not such a good test as it will use the same cache)
  • Whatever your approach, if the latest changes to the page still don't appear, you most probably have the non-cache-related variant of this bug - see ViewAfterSaveLosesText, and report it there (please attach the 'view source' output from earlier).

Environment

TWiki version: TWiki.org (as of 20 Jan 2002, i.e. Dec 2001 code?)
TWiki plugins: Interwiki, Calendar, TWikiDraw, etc.
Server OS: Linux (SourceForge)
Web server: Apache 1.3.20 (from Netcraft test of twiki.org)
Perl version: ???
Client OS: Win2000 SP2
Web Browser: IE 5.5 SP2, Mozilla 0.9.9

CGI.pm version: ???

Proxy cache: SourceForge server-side cache? See BrowserAndProxyCacheControl.

-- RichardDonkin - 19 Mar 2002

Follow up

The two variants of this bug are somewhat confusing - this one is a caching bug. See ViewAfterSaveLosesText for some discussion of possible fixes for this bug through BrowserAndProxyCacheControl.

NOTE: If you get this bug, please record your CGI.pm version, and any use of proxy caches. This bug has been lingering around for many months...

-- RichardDonkin - 19 Mar 2002

(Some older info on this bug, pulled in from SavePreviewTextOnServer:)_

I've had this a few times using IE6 (or at least what I think is the same problem), on twiki.org - I was using the JunkBuster non-caching proxy for this test. (Steps deleted)

-- RichardDonkin - 20 Jan 2002

I just had this again at TWiki.org when editing FeedReader - I did one edit to add some text, another edit to delete it, and then a final edit to add some (different) text. The View page generated after the final Preview omitted the third edit's text, but a Refresh using IE 5.5 showed the correct final version. Since I was using a proxy cache at the time, I think this is more evidence of a cache-related problem, as mentioned above.

-- RichardDonkin - 12 Feb 2002

I had this bug again, while editing RefreshEditPage at TWiki.org using MozillaBrowser 0.9.9: I hit Preview, then hit browser Stop button, then Back, then made some changes, then Save again. Not sure if the Stop-then-Back is relevant - it shouldn't be - but interesting to see that this is browser-independent.

Hitting Shift/Reload on the view-after-save page did show the latest version of the page - so this is probably due to the server-side proxy caches at SourceForge (mentioned by PeterThoeny at EditKnowsMyNameButSavesChangesAsTwikiGuest - does anyone know a URL with more docs?).

Looks like a caching problem, so a RefreshEditPage type solution for view-after-save should work.

UPDATE: I just had this bug again while editing RichardDonkin - I didn't hit the stop button at any point, so that's not relevant. Also, a normal Refresh in IE5.5 worked fine - no need to hit Ctrl/Refresh or whatever in this case.

-- RichardDonkin - 21 Mar 2002

Fix record

This is probably easily fixed by making the 'view after save' URL include a variable string, e.g. the time=NNNN value used by RefreshEditPage. This reliably turns off browser and proxy caching (even OperaBrowser's) and is easily implemented. Unlike the use of caching headers, it enables the browser's Back button and history to work fine, while still ensuring that old page content is not shown when a new page should be shown instead.

-- RichardDonkin - 19 Mar 2002

No, the real solution is to make the view expire immediately!!!

Dangerous suggestion by Colas deleted, since it also recreated BackFromPreviewLosesText in cross-browser form - see BugInHttpEquiv -- RichardDonkin

-- ColasNahaboo - 21 Mar 2002

Doing this immediate expiry will affect every View page - do you really want to have to go to the server when you hit the Back button to a View page? This would in fact not prevent caching by proxy caches, which never look at the HTML content, only the HTTP headers - so this wouldn't actually work in a proxy cache environment. See BrowserAndProxyCacheControl for some background on this.

Using a RefreshEditPage approach, i.e. unique URL for view-after-save, ensures that when doing view after save, the server is always checked for the latest page. However, someone just browsing View pages can benefit from browser and proxy caching, because the normal view URL is used for all View pages that are not the immediate result of hitting the Save button.

You can simulate this approach by clicking on http://twiki.org/cgi-bin/view/Main/WebHome?t=11111 and then modifying the URL to say t=22222 and hitting Enter - you will see your modem/network card going active, due to your browser accessing TWiki.org (which is what's wanted for view-after-save). Now hit Back, and then Forward - your browser has cached the page under both URLs, so it's very quick to do this, and you won't see any Internet access. This is the goal - that when you first use a t=NNNN type URL (on a view-after-save) the browser gets the latest page from the TWiki server, but browser Back and Forward buttons work quickly due to browser caching.

One additional refinement is to set the HTTP 1.0 and 1.1 headers to ensure that proxy caches don't store the view-after-save pages - typically they won't cache a URL containing a '?', but this is not guaranteed so it's best to set some cache control headers as well.

Since you came up with the 'each version of page should have a unique URL' idea in RefreshEditPage, I expected you to agree with this smile

-- RichardDonkin - 21 Mar 2002

The problem is that pages can effectively change at any second on a wiki (they can be edited by others). So when you go back to a page, it can actually be changed. And even with the ?time= hack on view after save, suppose you edit, save, go to a link, and go back to page: you risk getting the old version and see your changes gone.

And you are right wrt to the proxy cache. I checked, and I actually changed TWiki code to add also these expires in the headers.

-- ColasNahaboo - 21 Mar 2002

I think we need to distinguish between three cases:

  • Using the Back button in browser to return to a View page - here it is completely acceptable, according to the HTTP/1.1 spec, for browsers to record in-memory copies of previously visited pages in the 'browser history' (as OperaBrowser does), and for TWiki to cooperate with this.
    • Browsers in fact use different rules for this usage of the cache, termed 'browser history', since the idea is that the user should actually be able to see the page as it was when visited during this session. See section 13.13 of the HTTP/1.1 spec for the difference between history mechanisms (i.e. Back/Forward functions) and caching - this is backed up by the spec.
  • View page after pressing the Save button - the View page resulting from this should never be cached
    • There are no exceptions, since caching in this situation results in confusion for the user.
  • Using a normal link to visit a View page - this should not be cached if you really want the absolute latest version of the page.
    • However, note that this imposes a heavier load on large TWiki sites (particularly busy Internet sites) and may not be worthwhile if the pages change slowly (e.g. once a week). The load on a wide area network within a corporation may be significant, which is why some companies deploy proxy caches at remote sites within their intranet, for intranet sites as well as Internet sites - if there are 100 people at a remote site hitting the same TWiki pages, which don't change on a daily basis, why not cache them for up to (say) 1 hour? The pages will be served much faster from the proxy cache, network load is reduced, and updates will be visible within at most 1 hour. You can change the 1 hour expiry to 5 minutes, 10 seconds or 'expire now', if you prefer, depending on your requriements for the timeliness vs. performance trade-off.

Of course, the caching of normally-visited View pages is highly site dependent, and might even vary across TWiki webs - it would need to be quite configurable and is not something that we can implement in the short term (partly because of the use of %-variables in TWiki, which complicate calculations of 'last-modified' times that affect caching.)

However, fixing this bug is like RefreshEditPage, i.e. it's important to defeat caching in the view-after-save case, but not in the 'Back button' case. (And in fact, with browsers such as Opera, it's impossible to defeat the 'back button' caching used by the browser history - cache control headers are ignored by the browser here. If you'd like to experiment, see the tools linked from BrowserAndProxyCacheControl.)

Ironically, I just had another cache-related problem while editing this page - I opened another IE window and caused BackFromPreviewLosesText [losing all of the text in this comment!] - this bug is fixed in TWikiAlphaRelease but not yet on TWiki.org. It's also interesting that InternetExplorer's browser history is broken in one way (i.e. opening a new window causes it to ignore the cached version of a page when you hit Back); but another piece of IE brokenness allows TWiki to fix this (i.e. it listens to cache control headers in the browser history function, when really it shouldn't...)

-- RichardDonkin - 21 Mar 2002

Ok, look at this:

  • I edit page A, save, so TWiki makes me see view/A?time=nnn
  • then I go to other(s) page(s).l If a link brings me back to A I will see the old version of the page, quite unsettling.
Note that this is not the back button problem.

But you are right. A reasonable solution would be to be able to set the expiry date to "now + T" where T would be 10s for me and 1 hour for you.

-- ColasNahaboo - 21 Mar 2002

In the second step of your example, the link that brings you back to A would not have any ?time=NNNN suffix (that's only used for view-after-save), so normal browser/proxy caching would apply (i.e. the third case that I outlined above).

As long as you set the cache control headers on 'normal view' pages to 'expire now' or 'expire in 10 sec', you would always see the latest version of the page.

I agree about configurability of caching. The key thing is that the 'normal view' expiry is easily configurable from 'now' to seconds to hours, depending on the preferences of the TWiki administrator, the proxy cache administrator, etc. If the TWiki site is fairly small, it's not a big deal to bypass caching - it's likely to become more of an issue for really huge TWiki deployments with many different user groups. See BackFromPreviewLosesTextOld for some discussion of this (at the end.)

In any case, any fix for this bug will not affect 'normal view' page caching at all, only view-after-save.

-- RichardDonkin - 22 Mar 2002

Okay, just been bitten by this bug as well. I was running a dev server with the current December 2001 release with no problems. I've been moving over to a production box this week and decided to try the latest beta (TWiki20020414beta) but saw the problem of the edit page caching the previous text and not getting the latest text. I've just rolled back to the Dec 2001 release and the problem seems to have dissapeared again. On the beta release I attempted the expires header but it didn't solve the problem.

Just an observation.

Cheers

-- NathanReeves - 03 Jul 2002

Just noticed your comment. The problem you had sounds like RefreshEditPage, which is fixed in the TWikiBetaRelease (probably the fix was in the beta you tried, but I'm not sure). You might have some other problem - best to open a new BugReport if you still have this.

-- RichardDonkin - 17 Oct 2002

Dangerous suggestion by Colas deleted above, since it also recreated BackFromPreviewLosesText in cross-browser form - see BugInHttpEquiv. Those who want to live dangerously can see it in the history of this page!

-- RichardDonkin - 01 Jul 2003

I ran into this in an installation of twiki (Feb 2003) on a Debian woody server. I've attached a patch that seemed to work for me, which simply adds headers requesting that non-edit pages never be cached by the browser. This is with CGI.pm version 2.81.

-- AndrewMoise - 15 Nov 2004

Interesting patch, but I think it's simpler and more reliable to just redirect to view?time=NNNNN, as in RefreshEditPage, as that will defeat caching by all browsers and proxy caches, even those that incorrectly ignore cache control headers. You could set the headers as well but that didn't prove necessary in RefreshEditPage.

Also, the following two lines seem to be inconsistent - you could in fact use the same code as BackFromPreviewLosesText, but set the $expireHours variable to zero, ensuring consistency by calculating seconds from hours.

-- RichardDonkin - 16 Nov 2004

... except that using view?time=NNNNN won't help with a cached attach page, which bit one of my users not five minutes ago (attach image, attach another image, return to attach page, "Hey, it didn't attach the second image!"). Broken caches are, well, broken, but right now twiki's pages are not cacheable and it's telling the browser (via its silence on the matter) that they are. That's wrong, and it causes problems -- appending ?time=NNNNN to every twiki URL is a solution, but I think cache control headers are cleaner.

Of course, the fact that my patch didn't help with attach means that it doesn't really work; why that is I'm not sure. I did notice that formatGmTime($now) doesn't include seconds, so I switched to $expiresString = formatGmTime($now, 'http'), which includes seconds. I'll let you know if that seems to work for me.

I agree with your elegance complaints. I'm just worried about correctness right now; consistency can come later :-).

-- AndrewMoise - 16 Nov 2004

Okay, my understanding of cache control directives was just botched. I've updated the patch, and I think this one works. It now simply sets "expires" to 2 days ago and "last-modified" to 1 day ago. I've been using this one for the last few days, and I haven't heard any complaints or seen any misbehavior.

FWIW, there is a way to allow pages to be cached by the browser and yet correct the edit-view-"hey it didn't change" bug. If view takes on the functionality of save and upload (and any other scripts that directly modify a page), then browsers will know that the page needs to be reloaded simply because the POST request to view (to save the page or whatever) isn't cachable. That seems like the best of all solutions to me -- indirect changes to a page (e.g. changing a variable that the page depends on) won't always take effect immediately, but that seems okay. Pages will be cached for typical (non-page-editing) users, keeping performance good, but a page will never appear to change back to the before-it-was-edited version as is possible now.

-- AndrewMoise - 19 Nov 2004

Your improved patch does address the immediate problem, but it means that all view pages are uncacheable, which is not great. We would need to address attach as well, as you say. My comment above summarises my view of what we need to do w.r.t. caching, though it should be modified to change 'after Save' into 'after any POST operation', including Save, Attach, etc.

See RefreshEditPage for why we ended up going with the ?t=NNNN solution - the problem is that some browsers, e.g. Opera, cache very aggressively and this seemed to be the only way to stop them doing this. This URL suffix would only be needed on some pages, i.e. those resulting from POST operations as you point out. Rewriting scripts into a single dispatcher is an interesting idea but may make some things harder, e.g. using .htaccess style access control security.

-- RichardDonkin - 21 Nov 2004

Hmm... well, the current behavior is basically random. Non- edit pages don't get cache control headers, which means that their cachability is subject to the vagaries of the user's browser, any proxies in use, the version of CGI.pm installed, etc. I think explicitly caching no pages would be an improvement over that.

That said, actually doing some sort of intelligent caching would be even better. How about this?:

  • view pages with %SEARCH% or %INCLUDE% are given immediate-expiration headers as in my patch below.
  • view pages without those directives get a Date header with their last modification date, letting the browser use whatever caching heuristic it thinks are appropriate.
  • edit pages get the current caching headers and t parameter. This isn't ideal; one of my users manually added an edit link to a page of his and then was confused when the resulting page (with no t parameter) was cached. He wasn't using IE (and that long expiration time is only needed to work around a stupid IE bug), so making the long expiration of edit pages only happen when the User-Agent is MSIE would have solved his problem. I think that's more trouble than it's worth though.
  • All other pages get immediate-expiration headers, since there won't be a big performance win for them.
  • save emits javascript which bounces the user back to view via a POST request, so that ViewAfterSaveCachesOldPage doesn't happen. I recently learned that javascript could do this, though I still don't know how. There could also be a 1-second refresh to view/Web/Topic?t=time, for users without javascript; those users might still be confused if they navigated away from the page and then back, but that seems like acceptable losses to me.

How's that sound? I'm not proposing to do any of this, understand; all of my (few) twiki cycles for the near future I'm going to spend on security stuff. I'm just trying to help figure out the right thing to do.

-- AndrewMoise - 01 Dec 2004

Should we modify the templates to put the modified time of a topic in a "Last-Modified" response header to prompt browsers to refetch the page when it changes? I kinda get this impression from http://www.mozilla.org/projects/netlib/http/http-caching-faq.html but most of it was over my head.

-- SamHasler - 01 Dec 2004

It's a good idea to set sensible cache headers on all pages - this was my intention when I originally did the code for BackFromPreviewLosesText. Generally, it's a good idea for the site to say when it thinks the pages will expire - browsers and proxy caches are free to adjust this as needed, but giving the date only forces them to use a heuristic that is probably not right for that site.

As mentioned by Colas on RefreshEditPage, any cache setup must be very configurable, as different sites and people disagree on what is reasonable.

Testing for SEARCH and INCLUDE variables is a good idea, but not really complete - there are many dynamic variables, and ideally plugins would register their variables as to whether they should be considered 'non-cacheable'.

Actually the t parameter is used for RefreshEditPage, which can happen on many browsers including Opera and (I think) Firefox/Mozilla. Opera and other browsers frequently run with User-Agent of MSIE to get round broken sites that don't work unless they think they are working with IE.

Having a long expiration on edit pages seems quite harmless to me - we can't really do much about people putting custom edit links on their own pages and getting confused by the caching effects as a result, but having a low cache time for Edit could lose form data, which seems much more of a problem.

On very slow networks, there is some benefit in having reasonable caching on non-view pages, e.g. changes, but this is site and (worse) user specific.

I like the idea of redirecting but I'm not sure as to the best way of doing this - save already does a redirect as part of its logic, so I'd hope this can be done in the same place without adding JavaScript.

-- RichardDonkin - 05 Dec 2004

Agreed that cache behavior should be configurable.

Agreed that testing for SEARCH and INCLUDE is an incomplete solution -- but it seems to me to solve most of the problems with pages that are inherently dynamic (e.g. BugReport) so that the last modification date is just worthless in terms of figuring out whether to send a new copy of the page or not. Certainly it's better than sending headers that allow caching without considering whether SEARCH and INCLUDE appear on the page.

RefreshEditPage can only happen on many browsers because the current headers request that edit page be cached for 24 hours. That's a workaround for a very stupid bug which is MSIE-specific. We certainly can do something about people putting edit links in their pages and being confused by the odd result; as I mentioned above, we can make the long edit cache-control only happen for MSIE (or its impersonators), so users of other browsers don't have to deal with its fallout. As I said above, though, I think that's more complexity than it's worth (I'm just pointing out that it's certainly an option). The behavior may seem quite harmless to you, but as I said, I had an actual (non-MSIE) user who was actually complaining to me because his handmade edit link wasn't working. Given that the downside to limiting the long edit caching to MSIE users is pretty infinitesimal, I certainly think it's worth thinking about.

Having the cache behavior be user-specific I hadn't thought about; that certainly sounds interesting. Hmm...

save currently redirects with a GET request, which is the only way you can do it with an HTTP redirect. I was talking about javascript as a way of getting a POST "redirect"; I think it's the only way to get one.

I also hate javascript. Don't worry. smile

-- AndrewMoise - 07 Dec 2004

RefreshEditPage happened a lot on browsers well before the 24 hour edit headers - check the history. Having a really easy 're-edit this page' link / button on the Preview page would remove some of the temptation for people to hit Back. I don't like the browser sniffing issue either, but removing Edit page caching is not a good idea - I use Back even on Firefox, where it works nicely and is fast (particularly handy on TWiki.org), and I wouldn't want to break this on many pages.

Helping users to do hand-made Edit links is just a matter of pointing them to the relevant TWikiVariables, and I don't think re-introducing the BackFromPreviewLosesText bug is a reasonable tradeoff - it would hurt novice users to help those advanced enough to customise Edit links.

If it would help, setting the cache time to 1 or 4 hours would be OK - most people will hit Back within that period. I set the cache period high because I couldn't see a downside back then.

Thanks for the clarification on redirects - I guess the options are:

  1. Redirect with GET and suffix ?t=NNNN - prevents retrieval of that view page from existing cache, but does allow new page to be cached for purposes of Back button. However, prevents it being cached for general use (e.g. by other proxy cache users).
  2. Redirect using POST, requiring use of JavaScript, with suitable cache control headers. More likely to be a general solution through use of cache headers on all view-like TWiki pages (attach, etc), rather than having to go through all URLs and suffix random strings.

The second solution does seem cleaner but will need to be done in a configurable way. I anticipate lots of interesting 'I didn't want caching at all' responses, so configurability to include 'expire now' is important...

Ideally I'd like 2 types of cache control headers:

  1. 'Fresh page' - just modified by save, attach, etc, or highly dynamic due to TWikiVariables - don't cache
  2. 'Normal static page' - anything else, assumed to be fairly static - cache for 1, 8 or 24 hours (highly configurable including '0'!)

The 'fresh' case will slightly reduce cacheability of 'normal static' pages, but avoids serious confusion, and the first time someone accesses that page in 'normal static' mode (e.g. from WebNotify) it will be cached for future use, which I think is fine.

There is one other caching issue - attachments served directly by Apache. Would be useful to document to people how to set cache headers through Apache directives to limit cacheability - someone ran into exactly the fresh vs. other on uploading an attachment then re-uploading it a bit later, getting the cached original version on testing download.

Belated response to SamHasler - putting cache control in templates is not a good idea, since HTML META directives are mostly ignored by proxy caches. See BrowserAndProxyCacheControl. In fact, let's deprecate the HTTP_EQUIV TWikiPreferences ASAP unless someone objects.

-- RichardDonkin - 08 Dec 2004

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatpatch ViewAfterSaveCachesOldPage-ugly-fix.patch r2 r1 manage 1.6 K 2004-11-19 - 19:29 AndrewMoise Patch that makes non-edit pages always reload from the server
Edit | Attach | Watch | Print version | History: r22 < r21 < r20 < r19 < r18 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r22 - 2004-12-08 - RichardDonkin
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.