Please feel free to discuss implementation details and new feature in this page.
Description
Parses Apache access log file and gzipped access log files to produce statisticts.
To do
- Replace the die by an error message upon opening the log file - DONE 21 Feb 2006
- Add support for parsing multiple access log file i.e. zipped log file history - DONE 21 Feb 2006
- As highlighted by Tobias below that plugin needs to be secured to prevent data mining. Moreover it could be quite demanding for servers with large log file history. This is acceptable in my opinion if we can restrict access to the statistics to specific users or groups. If unauthorized user is viewing a page using ACCESSSTATS tag we could just output an error message instead of the statistics. - OPEN
- Currently reading and parsing of the access log files is done for each ACCESSSTATS tag on a page. Maybe there is a way to read the log file only once per page? Should we use the commondTagHandler rather than registered tag handler? - OPEN
- As recommended by Tobias below one should be able to limit the scope of the access log search to the TWiki installation directory. A possible solution is suggested here. - DONE 27 Feb 2006
Possible new features
Could display matched lines or part of them(i.e. IP address).
--
StephaneLenclud - 18 Feb 2006
Discussion
Thanks Stephane for contributing this Plugin and sharing it with the
TWikiCommunity!
I made some minor changes to the Plugin topic, feel free to roll that back in the next release.
How about measuring and documenting the
PluginBenchmarks numbers?
--
PeterThoeny - 20 Feb 2006
Version 1.001 now available. I'll try to have a look at the
PluginBenchmarks at some point. I would like to optimize things a bit though before benchmarking. I'd like to find a way to read access log files only once per page rendering instead of once for each tags.
--
StephaneLenclud - 21 Feb 2006
Is the regex of this Plugin limited to TWiki sites?
If not, this Plugin opens possibly the door for unwanted datamining on the server. It should be configurable (outside of TWiki pages) to limited access only to the stats of TWiki sites. But then, it should be enough to parse TWikis own log files to get these informations instead of touching apaches log files.
I would wish to see some notes about security on the Plugin site before using it.
--
TobiasRoeser - 25 Feb 2006
No the regexp is not limited to TWiki site. The idea was to get hit count for attachments. Since it does not appear to be available through the TWiki web statistics topic I thought of getting that from the apache logs directly; not sure you can get it in TWiki logs, can you? You are completely right one could use the
ACCESSSTATS tag for getting any kind of informations from the access log. I did not bother solving that issue because my TWiki is not open for editing at the moment. I have an action open
above for securing access to the statistics. I've now edited that entry in the
To Do list and added a new one. An easy way to limit the regexp to TWiki installation directory would be to disable the default parameter in the
ACCESSSTATS tag. I'll publish a version in which it's easy to enable/disable usage of the default parameter through settings in the dot pm file.
Thanks for your input.
--
StephaneLenclud - 27 Feb 2006
Stephane, I've not looked at your code, so I don't know when you're parsing, or how you're making sure that you don't re-parse my 6GB logfile.
I implemented something similar for mrtg - its commited in
http://svn.twiki.org/svn/twiki/trunk/tools/admin/mrtg
and it gives us the pretty graph at
http://twiki.org/~sdowideit/mrtg/twiki/twiki.html
showing the number of TWiki topic requests served be 5 minute block. I guess I should be packaging up an
MrtgContrib...
--
SvenDowideit - 09 Apr 2007
Thanks for the pointer. That plugin is making sure it parses 6 GB of logs for each tag

It's not doing any caching not saving any persistent data. It's the very first TWiki plugin I developed as an exercise and its implementation is very straight forward.
I just wanted to know how many download I had on certain attachments. Basically you publish a document on your site you just want to measure the public interest in that single document.
If my web site ever become very popular I'll surely implement some persistence not to parse all the log history constantly

I'm not planning to turn that into a fully blown log parser. I was just fixing an issue I had since I moved my server from Open Suse to Ubuntu and adding it to SVN at the same time.
--
StephaneLenclud - 09 Apr 2007
Hi guys, when I tried to use the plugin I get:
TWiki detected an internal error - please check your TWiki logs and webserver logs for more information.
Can't opendir path: Permission denied
I changed the owners to root:apache and the permissions on the '/var/log/httpd and the access_log files to 654. It seems to need the execute bit set for the plugin to work.
--
PeterStephens - 12 Aug 2007
I think directories always have to have execute permissions on Linux. At least most of my directories have. But the log files themselves are -rw-r--r--. To be sure you don't have a re-occurring problem whenever a new log file is created you have to set up your log rotation system to use the desired permissions. On my Ubuntu installation for instance I had to edit
/etc/logrotate.d/apache2 and fix the permission specified in there.
--
StephaneLenclud - 13 Aug 2007
Hi, Stephane:
I use %ACCESSSTATS{attachment="TeamSpace-URD.doc"}% to count my attachment file download # and it shows 3! After I download the file 3 more times (and I can see it appears in my httpd log files 6 times) but the variable still shows 3! Does anybody has the same problem?
BTW, will the plugin parse all my access log like access_log, access_log.1, access_log.2...?
--
MagicYang - 02 Jan 2008
Hi
MagicYang,
Sorry for the late answer.
It should indeed parse those files and the gz files too, that was the intention anyway. However you may find that this plug-in behavior was tuned for my particular apache setting and might not be parsing some of your log files.
I'm guessing your problem comes from the fact your access files are named
access_log whereas mines are
access.log. To fix that, open the
AccessStatsPlugin.pm file and check the values for
$accessLogFileName and
$accessLogDirectory. If it's still not working you may want to take a look at the
getAccessLogLines sub.
--
StephaneLenclud - 19 Feb 2008
MartinSeibert: Thanks for your interest in this plug-in. However I'm not planning to add screenshots on the documentation page. All it does is replacing the
%ACCESSSTATS% tag with a number. For instance it can give you the number of time an attachment was downloaded.
--
StephaneLenclud - 19 Feb 2008
Stephane: Okay. Thank you.
--
MartinSeibert - 20 Feb 2008
That plug-in really needs caching now. At least my server does since it now needs to parse 2 years of logs for each
%TAG%. Any Perl/TWiki API I should use for storing my cached data?
Some ideas:
- We could cache the hit counts of compressed logs and get the final number on the fly by adding results from uncompressed logs.
- We could use some crontab job to do caching for us but then how on hearth will the job know about the regex/attachment/topic to look for.
- We could also possibly get some AJAX magic to parse the logs asynchronously
--
StephaneLenclud - 20 Feb 2008
I've seen the light after installing
AWStats
on my machine. Sooner or later I'm planning to completely modify that plug-in to get statistics from the Codev.AWStats database.
I wonder if I should just deprecate the current behavior or just implement an
AWStatsPlugin? Keeping that plug-in in this current state does not make much sense as it does not scale at all. Pages just won't load on my server with only one year of logs.
--
StephaneLenclud - 16 Apr 2008
Can you tell me why this plugin reads the Apache log and not the TWiki log?
--
VickiBrown - 2009-10-20
I changed the modification policy of this plugin with to
PleaseFeelFreeToModify after checking with
StephaneLenclud.
--
PeterThoeny - 2011-05-03