Tags:
create new tag
view all tags

Question

We had several view processes running for hours. They had to be killed on the web server. This TWiki was installed about a month prior.

> > > Subject: Processes
> > >
> > > Combined, these processes were eating up all the memory (2 gigs)
> > > and swap (9 gigs) on NWSWWW.
> > >
> > > I killed them all around 9:40 last night.
> > >
> > > perf_dat 19774 23252 1 18:27:40 ? 106:44
> > /usr/bin/perl -wT view.pl
> > > perf_dat 27836 683 1 14:55:14 ? 210:30
> > /usr/bin/perl -wT view.pl
> > > perf_dat 20040 755 1 20:04:59 ? 49:27
> > /usr/bin/perl -wT view.pl
> > > perf_dat 23023 1138 1 14:41:20 ? 208:40
> > /usr/bin/perl -wT view.pl

I tried to figure out from the log file which topics were being viewed - in those topics, did not see any obvious recursion or anything that might explain this behavior.

We have (at least one other) TWiki in use for past few years, on the same server, and have not seen this problem.

Environment

TWiki version: TWikiRelease01Feb2003
TWiki plugins: DefaultPlugin, EmptyPlugin, InterwikiPlugin
Server OS: Solaris 5.8
Web server: Apache 1.3.26
Perl version: 5.005_03
Client OS: Solaris 5.6
Web Browser: Mozilla 1.3.1

-- JohnBlevin - 05 Jan 2004

Answer

The TWikiRelease01Feb2003 has no known recursive loop. However, I just fixed the CalendarPlugin which had one under certain circumstances of including and included topic.

To debug, try to disable all Plugins (see TWiki.cfg). Check also if Apache's error log has something unusual.

-- PeterThoeny - 06 Jan 2004

More Info

I found out later that one of our users had closed the browser while the view script was still executing. They did this several times during the course of the day. Could there be a problem with view processes continuing to run and chewing up resources, if the browser is terminated before view script completes?

-- JohnBlevin - 09 Jan 2004

I imagine this is possible - Apache does have a feature to limit the CPU used by processes, but I would have thought it would just kill the child process when closing its file descriptors (which should be part of finishing the HTTP transaction). However, researching this sounds worthwhile!

-- RichardDonkin - 10 Jan 2004

This happened yesterday and again this morning with both bin/view and bin/rdiff on twiki version 10 Mar 2004. My sysadmin is getting unhappy as it is noticeably impairing the other sites being hosted.

I went through data/log*.txt ,viewing and rdiffing the some of the same pages and couldn't duplicate the problem at that time.

the apache error log contains lines like: "Premature end of script headers: rdiff"

LATER: I went through the log and opened all the pages viewed since 00:00am today. Nothing untoward happened until I opened the last ~15 pages at once and then closed the browser before they finished loading. the Apache error log has some premature end of script headers for this morning but no errors for the 3 runaway rdiff process I just kicked off now. I've renamed rdiff to .rdiff so it can't be called.

LATER STILL: the premature end of script message occurs when the admin kills the runaway process, so it's a resultant error not a causative one.

-- MattWilkie - 26 Mar 2004

Hey Matt,

is this running with RcsLite or RcsWrap?

-- SvenDowideit - 27 Mar 2004

Lite.

renaming rdiff has helped, but not much as it still happens with view. frown

-- MattWilkie - 27 Mar 2004

This isn't much help in debugging this problem, but on Apache servers where you have admin access, you can use Apache 1.2 or higher CPU resource limits to kill runaway processes automatically.

-- RichardDonkin - 27 Mar 2004

#!/bin/env /usr/bin/perl
# ugly hack of a script to kill CGI's which
# have gone out of control
#
# 1) run top, get PID's of hung processes
# 2) put them in the kill line below
# 3) point browser at bin/kill-procs
# 4) edit again and comment out the kill line, remove the PID's <-- DON'T SKIP THIS STEP!

# `kill -9 ### ###`

## for some reason this doesn't work:
#print <<THIS;
#Content-Type: text/html
#<html><body>Killed processes</body></html>
#THIS

thanks to MS for helping giving me a band-aid. On monday I'll follow up with the sysadmin and see if the CPU Limit thing is feasible. (does that actually kill processes or just slow them down? the description is not clear)

-- MattWilkie - 28 Mar 2004

SvenDowideit may have squashed the bug over the weekend. In RcsLite.pm change $version == $target to $version <= $target . Going on 1836 hours now with no hung cgi's!

Index: lib/TWiki/Store/RcsLite.pm
===================================================================
RCS file: /cvsroot/twiki/twiki/lib/TWiki/Store/RcsLite.pm,v
retrieving revision 1.11
diff -r1.11 RcsLite.pm
834a835
>
838c839
< if( $version == $target ) {
---
> if( $version <= $target ) { 

Okay I'm pretty sure we can say this bug is now squashed.

-- MattWilkie - 29,30 Mar 2004

If you do not have root access on your hosted site see QuickAndDirtyExecUtilityForHostedSites

-- PeterThoeny - 01 Apr 2004

It's great that you found a fix for the problem! I was really psyched, until I found that our configuration is using RcsWrap not RcsLite. So, the recursion problem in RcsLite is apparently not what is causing the problem in our case.

Is there is a similar bug for RcsWrap? I took a look through the file, but don't find similar recursion...

The only other thing I found, is that there is another TWiki site on our web server, which has the identical CGI scripts to ours, except that they have as the first line

#!/opt/exp/bin/perl -wT [version 5.6.1]

but ours is

#!/usr/bin/perl -wT [version 5.005_03]

The two TWiki's also have a different $safeEnvPath and $rcsDir (though the RCS version in both $rcsDir's is 5.7) ...

-- JohnBlevin - 22 Apr 2004

I would try changing the first line of your scripts to match the newer perl version.

The RcsLite bug just exacerbated a bug already present in Apache2. There is a workaround, see TWikiOnApache2dot0Hangs, Owiki:ApacheTwoHangs, TimeOutSavingTWikiPreferences.

-- MattWilkie - 23 Apr 2004

Our web administrator says we are running Apache 1.3.26. I guess that rules out our problem being related to Apache 2. I'm going to update our scripts to point to perl 5.6.1, other than that have no leads at this point. Any ideas/suggestions?

-- JohnBlevin - 26 Apr 2004

After nearly 10 months of problem-free operation, this one bit us again today.

  PID USERNAME THR PRI NICE  SIZE   RES STATE   TIME    CPU COMMAND
18136 perf_dat   1  10    0  366M  343M cpu1   17:43 27.90% view.pl
17964 perf_dat   1  10    0  366M  343M cpu1   17:45 27.61% view.pl
20369 perf_dat   1   0    0  325M  302M run    15:46 27.49% view.pl
  263 root       6  58    0 7448K 5456K sleep  51:48  0.37% automountd
 2117 hosu       1  58    0 1776K 1456K cpu3    0:00  0.24% top-sun4u-5.8

We're still not using Apache 2, not using RcsLite. If this happens again they might pull the plug on our TWiki. Anybody have any new info on this?

-- JohnBlevin - 27 Oct 2004

Edit | Attach | Watch | Print version | History: r17 < r16 < r15 < r14 < r13 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r17 - 2004-10-27 - JohnBlevin
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.