Tags:
caching1Add my vote for this tag cookbook1Add my vote for this tag performance2Add my vote for this tag syndication1Add my vote for this tag create new tag
, view all tags

How to Cache RSS Feeds with Apache RewriteRule

Public TWiki sites may experience a lot of traffic by the RSS feeds. For example, one third of the topic views on TWiki.org is caused by RSS feeds (151K of total 452K views in any given week).

It is possible to use wget to cache RSS feeds as static HTML pages and to use Apache's mighty RewriteRule to deliver the static HTML file when a TWiki view script is accessed.

Here is how to set this up:

1. Create a /feeds directory under the htdocs root directory

2. Create a cron job that generates the static HTML pages. Example for Codev web:

2,17,32,47 * * * * cd /path/to/htdocs/feeds; /path/to/bin/wget --http-user TWikiGuest --http-passwd guest -O CodevWebRss.xml http://twiki.org/cgi-bin/view/Codev/WebRss?t=t > .log.txt 2>&1

The ?t=t query string makes sure that the rewrite rule does not fire for the cache update and for RSS feeds that have a search parameter. The TWikiGuest login prevents a redirect in case a recent topic has a view access restriction, which results in an invalid file of zero bytes.

In this example, the generated file can be accessed as http://feeds/CodevWebRss.xml

3. Update Apache http.conf with these rules for the cgi-bin directory:

    RewriteEngine On
    RewriteCond %{QUERY_STRING}  ^$
    RewriteRule view/(Codev|Main|Plugins|Sandbox|Support|TWiki)/WebRss$ /feeds/$1\WebRss.xml [L,T=application/rss+xml]

Make sure to load the rewrite module, consult the Apache docs.

4. Restart Apache with sbin/apachectl restart

-- Contributors: PeterThoeny, MichaelDaum

Discussion

The itching factor for this "how-to" is described in CacheWebRssFeedForSpeed.

-- PeterThoeny - 09 Mar 2006

The rewrite rule should use application/rss+xml and also cache the WebAtom feed. Here's a more generic rewrite rule that btw supports the BlogPlugin 's extra feeds also

RewriteEngine On
RewriteCond %{QUERY_STRING}  ^$
RewriteRule view/(.*)/(Web(Rss|Atom)(Combined|Comments|Teaser)?)$ /feeds/$1$2.xml [L,T=application/rss+xml]

Note, that you have to adjust this if you've got hierarchical webs.

I attached a shell getfeeds script to be used in the cronjob instead of coding the wget into the crontab directly. Store it into an arbitrary directory; copy the getfeeds.conf.example file in the same directory renaming it to getfeeds.conf and change the default values in there.

I'd recommend to regenerate the feeds in a lower frequence like once an our:

0 * * * *  /home/www-data/twiki/getfeeds

(adjust the path to getfeeds).

-- MichaelDaum - 09 Mar 2006

Thanks Micha for the additional info and script.

On TWiki.org I actually installed many cron jobs, separated by 2 minutes. This refreshes each feed in 15 min intervals and distributes the load on the server. The getfeeds script is useful but adds bursted load on the server depending on the number of feeds you have on the server.

-- PeterThoeny - 09 Mar 2006

Then add SLEEP=60 to your getfeeds.conf file (defaults to 1 second).

-- MichaelDaum - 09 Mar 2006

Cool!

-- PeterThoeny - 09 Mar 2006

I've edited the script that MD provided to use curl instead of wget. wget was working nicely until it started fetching the same topic over and over. I had an intuition that using curl I might have better results, so I changed the script. Doing so helped greatly and sped things up.

line 42 of getfeeds I have:

#  wget -q -O $TMP_FILE $VIEW_URL/$WEB/$TOPIC?t=`date +"%s"` && mv $TMP_FILE $OUT_FILE && chmod go+r $OUT_FILE
  curl --compressed -s -G -o $TMP_FILE $VIEW_URL/$WEB/$TOPIC?t=$(date +"%s") && mv $TMP_FILE $OUT_FILE && chmod 644 $OUT_FILE

it's a simple change (oh and the chmod change too, I just wanted to be sure of the right perms on it). BTW, if you're not using mod_deflate or mod_gzip LEAVE OFF the --compressed it won't work otherwise.

Also, as to the having to restart the daemon above, that's overkill, you just need to reload; quite often that's enough. Only when you've changed a module (like adding or removing one for instance) would you need to restart.

HTH

-- EricCote - 15 Mar 2006

The cached RSS feeds randomly failed on TWiki.org. This was caused if a recently changed topic has a view access restriction, which triggered a redirect, which in turn resulted in a RSS file of zero bytes. I fixed this by supplying the TWikiGuest user to wget.

-- PeterThoeny - 17 Apr 2006

if QUERY_STRING is view/Main/WebRss, then "RewriteRule view/(.*)/(Web(Rss|Atom)(Combined|Comments|Teaser)?)$ /feeds/$1$2.xml" is OK, but what's the rule if QUERY_STRING is view/Main/webRss?skin=rss ? I ever try RewriteRule view/(.*)/(Web(Rss|Atom)(Combined|Comments|Teaser)?)\?skin=rss$ /feeds/$1$2.xml", it's not OK.

-- LunaLin - 10 Jan 2007

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatEXT getfeeds r2 r1 manage 1.9 K 2006-03-09 - 18:32 UnknownUser shell script to fetch twikis rss and atom feeds
Exampleexample getfeeds.conf.example r1 manage 0.2 K 2006-03-09 - 18:26 UnknownUser example configuration file for getfeeds
Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r9 - 2007-01-10 - LunaLin
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.