Tags:
create new tag
, view all tags
The statistics script is currently memory hungry because it reads the log file of the current month into an array.

SourceForge seems to kill processes that eat too much memory. The statistics script silently dies after reading a big log file. TWiki.org produces monthly log files of up to 18 MB. This is the cause of the problem reported in Support.WebStatisticsNotUpdatingOnWikilearn.

The correct fix is to copy the log file to a temp file (so that other processes can continue adding lines to the log file) and process the temp file line by line.

-- PeterThoeny - 13 Jan 2002

I'm planning on fixing this quite soon - I hope it is due to memory usage not CPU, as a rewrite for line-by-line mode would not fix the latter (although it would be a start, as the script could process N lines, then spawn another script for the next N lines, etc.)

Just thought I'd check to see if someone else is already working on this.

-- RichardDonkin - 03 Mar 2002

Here's the improved statistics script - as suggested, it uses a temp file and processes this line by line, using a hash of hashes to prove I just read Programming Perl ... wink The core of the script is completely re-written, though the input/output parts are largely unchanged.

It does seem to work OK, having run it in parallel a few times with the current script. I've also tested to make sure that topic views, saves and uploads show up correctly. The CPU time used on a PIII/700 laptop using Windows was 5 seconds for a 50K log file (not very efficient!) but that's almost identical to the current CPU time.

Known issue - if you have two topics with same frequency count and names that sort next to each other, you may find that the old and new script show one or the other within the 'top N' list. This is not a bug, it's just a vagary of the way sorting works - it's not worth doing a sort within each frequency count, given that the script is already quite complex.

You can test this by installing it as statistics-new, and editing the line that refers to TestStatistics.txt so that it builds that topic instead. This should let you run both scripts side by side for a while. Let me know how you get on - it does seem to work, and I did a lot of incremental testing, but you never know...

BTW I'm away for a few days from tomorrow, back on Tuesday - just so people know not to expect bug fixes while I'm skiing!

-- RichardDonkin - 06 Mar 2002

Has anyone tested this? I'm curious to know how well it works, before I commit it to CVS and inflict it on everyone smile

-- RichardDonkin - 18 Mar 2002

I installed it and it looks like the File perl module is needed as well in the twiki installation. I will try it when I get a chance to connect to CPAN and download the module.

-- JohnRouillard - 18 Mar 2002

Code looks good. Small suggestion, it is better to remove the dependency on File::Temp, we can generate a temporary file name with the time() and the pid for example.

-- PeterThoeny - 22 Mar 2002

I've attached a revised version that removes the File::Temp dependency. Somewhat less portable and secure (File::Temp takes great care to avoid temp file race conditions that create security holes), but should work on Unix and Windows for temp directory locations, and is much easier to install.

The temp dir code should probably be moved somewhere else, e.g. TWiki.pm, so that it's available to other scripts that need to create temp files. Ideally we would use File::Temp if it's installed, but default to the built-in code.

Let me know how it works - I've only tested it on a small log file to date.

UPDATE: Tested on Perl 5.005_03 on Linux, as well as 5.6.1 on Windows. New version attached that avoids the [:alnum:] construct (not supported until Perl 5.6).

-- RichardDonkin - 24 Mar 2002

Now in CVS for TWikiAlphaRelease. Still has quite a lot of debugging code, commented out - probably worth keeping this in for future enhancement work.

-- RichardDonkin - 25 Mar 2002

Great, thanks Richard! The statistics script does work now at TWiki.org after updating it to the latest TWikiAlphaRelease.

This month' log200203.txt file so far has 22MB and 244K lines. The script sits for 40 seconds while doing the calculations (before udating the WebStatistics topic in each web). This is perfectly fine, performance is no issue because the script is called by a scheduled task.

-- PeterThoeny - 27 Mar 2002

Thanks for the feedback - 40 seconds is not bad for a Perl script handling 22 MB of data! I've just done an update in CVS to handle the renaming of topics, also attached here - this would have been quite hard with the original code.

-- RichardDonkin - 27 Mar 2002

Have a look at http://twiki.org/cgi-bin/view/Plugins/WebStatistics?rev=1.258 - seems like this run of the script introduced the spurious 'Jan 2001' line. Perhaps you could send me the sourceforge logs for TWiki, so I can debug this locally (unless you feel like doing this)?

-- RichardDonkin - 28 Mar 2002

I removed the debug messages in case the statistics scripts encounters a problem line. This is because it blows up the debug.txt file without much gain. The debug.txt file at TWiki.org had 187MB! Is in TWikiAlphaRelease and at TWiki.org.

-- PeterThoeny - 20 Oct 2002

Edit | Attach | Watch | Print version | History: r15 < r14 < r13 < r12 < r11 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r15 - 2005-02-15 - SamHasler
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.