Runaway Apache Processes

Need for Analysis

The following is adapted from this post:

When dealing with runaway Apache processess, it would tremendously help if you could figure out the URI's these runaway httpd processes are attempting to serve. Horde developers can give much better advice and, if appropriate, attempt to remedy the problem when we know what the problem is.

For example, we recently ran into an issue where a Horde installation was seeing runaway processes. Trying to debug, that limited information didn't help much - after all, a full Horde installation contains tens of thousands of lines of code. However, analysis of the runaway httpd processes indicated there was a specific URI call to a page in kronolith with one single specific parameter passed in that allowed me to track this down in an hour to make this tiny change: http://lists.horde.org/archives/cvs/Week-of-Mon-20070115/064855.html

Here was the server analysis that sparked the discovery (thanks nuno):


20:19:59 up 29 days,  3:14,  1 user,  load average: 55.16, 38.82, 23.92
  412 processes: 403 sleeping, 8 running, 1 zombie, 0 stopped
  CPU states:  82.5% user,  17.5% system,   0.0% nice,   0.0% idle
  Mem:   3883356K total,  3865044K used,    18312K free,   207128K buffers
  Swap:   979924K total,    61724K used,   918200K free,  2434148K cached

    PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
   4858 webmail   25   0 29196  28M 24876 R    71.7  0.7  17:39 httpd
  27948 webmail   25   0 31888  31M 27104 R    13.5  0.8   6:20 httpd

  PID   REQUEST
  4858  GET /kronolith/year.php?year=1104543267 HTTP/1.1
  27948 GET /kronolith/year.php?year=1199151268 HTTP/1.1

The biggest issue was the fact that I didn't see this on my local machine when debugging because the runaway process turned out to be a PHP bug. It wasn't until I was able to track the potential issues down to a few lines of code that I was able to cross-reference with the PHP bugs database to determine the problem was this installation was using an older version of PHP that had the bug (in the mktime() function) while my local installation wasn't seeing any problems because I was using a newer version of PHP.

As far as how to do this server analysis - that's not something that I am personally going to be able to help with. Depending on your setup, this will vary. But if you can at least track the runaway processes down to a specific URI (and most likely a specific e-mail message), we will be more than happy to try to track it down.

How-to Analyze

Here is information useful in linking the PIDs to the runaway requests (adapted from this post):

I got the association between PIDs and Requests through Apache "server-status" (mod_status or something). So what I usually do is a 'top' and a lynx -dump http://my.server/server-status in separate windows. Usually I run top first, then lynx and I press 'space, q' on 'top' so I'm almost sure that the PIDs reflect the Requests in question (remember that in most configurations one apache process processes several requests).

If you run Linux and you want to know what user is being served by that request, you can also do a ls -l /proc/<pid>/fd and edit the session file.

Download this page as: Plain Text, HTML, Latex, reStructuredText