The
statistics script is currently memory hungry because it reads the log file of the current month into an array.
SourceForge seems to kill processes that eat too much memory.
The statistics script silently dies after reading a big log file. TWiki.org produces monthly log files of up to 18 MB. This is the cause of the problem reported in
Support.WebStatisticsNotUpdatingOnWikilearn.
The correct fix is to copy the log file to a temp file (so that other processes can continue adding lines to the log file) and process the temp file line by line.
--
PeterThoeny - 13 Jan 2002
I'm planning on fixing this quite soon - I hope it is due to memory usage not CPU, as a rewrite for line-by-line mode would not fix the latter (although it would be a start, as the script could process N lines, then spawn another script for the next N lines, etc.)
Just thought I'd check to see if someone else is already working on this.
--
RichardDonkin - 03 Mar 2002
Here's the improved statistics script - as suggested, it uses a temp file and processes this line by line, using a hash of hashes to prove I just read
Programming Perl ...

The core of the script is completely re-written, though the input/output parts are largely unchanged.
It does seem to work OK, having run it in parallel a few times with the current script. I've also tested to make sure that topic views, saves and uploads show up correctly. The CPU time used on a PIII/700 laptop using Windows was 5 seconds for a 50K log file (not very efficient!) but that's almost identical to the current CPU time.
Known issue - if you have two topics with same frequency count and names that sort next to each other, you may find that the old and new script show one or the other within the 'top N' list. This is
not a bug, it's just a vagary of the way sorting works - it's not worth doing a sort within each frequency count, given that the script is already quite complex.
You can test this by installing it as
statistics-new, and editing the line that refers to
TestStatistics.txt so that it builds that topic instead. This should let you run both scripts side by side for a while. Let me know how you get on - it does
seem to work, and I did a lot of incremental testing, but you never know...
BTW I'm away for a few days from tomorrow, back on Tuesday - just so people know not to expect bug fixes while I'm skiing!
--
RichardDonkin - 06 Mar 2002
Has anyone tested this? I'm curious to know how well it works, before I commit it to CVS and inflict it on everyone
--
RichardDonkin - 18 Mar 2002
I installed it and it looks like the File perl module
is needed as well in the twiki installation. I will try
it when I get a chance to connect to
CPAN and download
the module.
--
JohnRouillard - 18 Mar 2002
Code looks good. Small suggestion, it is better to remove the dependency on
File::Temp, we can generate a temporary file name with the time() and the pid for example.
--
PeterThoeny - 22 Mar 2002
I've attached a revised version that removes the
File::Temp dependency. Somewhat less portable and secure (
File::Temp takes great care to avoid temp file race conditions that create security holes), but should work on Unix and Windows for temp directory locations, and is much easier to install.
The temp dir code should probably be moved somewhere else, e.g. TWiki.pm, so that it's available to other scripts that need to create temp files. Ideally we would use
File::Temp if it's installed, but default to the built-in code.
Let me know how it works - I've only tested it on a small log file to date.
UPDATE: Tested on Perl 5.005_03 on Linux, as well as 5.6.1 on Windows. New version attached that avoids the [:alnum:] construct (not supported until Perl 5.6).
--
RichardDonkin - 24 Mar 2002
Now in CVS for
TWikiAlphaRelease. Still has quite a lot of debugging code, commented out - probably worth keeping this in for future enhancement work.
--
RichardDonkin - 25 Mar 2002
Great, thanks Richard! The statistics script does work now at TWiki.org after updating it to the latest
TWikiAlphaRelease.
This month' log200203.txt file so far has 22MB and 244K lines. The script sits for 40 seconds while doing the calculations (before udating the
WebStatistics topic in each web). This is perfectly fine, performance is no issue because the script is called by a scheduled task.
--
PeterThoeny - 27 Mar 2002
Thanks for the feedback - 40 seconds is not bad for a Perl script handling 22 MB of data! I've just done an update in CVS to handle the renaming of topics, also attached here - this would have been quite hard with the original code.
--
RichardDonkin - 27 Mar 2002
Have a look at
http://twiki.org/cgi-bin/view/Plugins/WebStatistics?rev=1.258
- seems like this run of the script introduced the spurious 'Jan 2001' line. Perhaps you could send me the sourceforge logs for TWiki, so I can debug this locally (unless you feel like doing this)?
--
RichardDonkin - 28 Mar 2002
I removed the debug messages in case the statistics scripts encounters a problem line. This is because it blows up the debug.txt file without much gain. The debug.txt file at TWiki.org had 187MB! Is in
TWikiAlphaRelease and at TWiki.org.
--
PeterThoeny - 20 Oct 2002