Apache / mod_wsgi process dies unexpectedly

I am testing the limit of my Python Flask web application running on an Apache web server by executing a query that takes more than 30 minutes to complete. A query requires thousands of database queries (one after another) to a MySQL database. I understand that, ideally, this should run as a separate asynchronous process outside of the apache server, but this time ignore it. The problem I am facing is that although this is fully implemented when I test it on my mac, it disappears abruptly when run on a linux server (Amazon linux on AWS EC2). I could not understand exactly what to kill him. I checked that the server is out of memory. The process uses very little RAM. I was not able to find any Apache configuration parameter or any error message that makes sense to me (even after installing apache logLevel for debugging). Please, I need help with where to look. Here are the details of my setup:


lead time

Server:. He died after 8 minutes, 27 minutes, 21 minutes and 22 minutes, respectively. Note that most of these runs were on the UAT server, and this was the only request that the server processed.

Mac: It worked much slower than on the server. The process was successful and took 2 hours 47 minutes.


Linux Server Information:
2 virtual processors and 4 GB of RAM

OS ( uname -a output)
Linux ip-172-31-63-211 3.14.44-32.39.amzn1.x86_64 # 1 SMP Thu Jun 11 20:33:38 UTC 2015 x86_64 x86_64 x86_64 GNU / Linux

Apache error_log: https://drive.google.com/file/d/0B3XXZfJyzJYsNkFDU3hJekRRUlU/view?usp=sharing

Apache configuration file: https://drive.google.com/file/d/0B3XXZfJyzJYsM2lhSmxfVVRNNjQ/view?usp=sharing

Apache version ( apachectl -V output)

 Server version: Apache/2.4.23 (Amazon) Server built: Jul 29 2016 21:42:17 Server Module Magic Number: 20120211:61 Server loaded: APR 1.5.1, APR-UTIL 1.4.1 Compiled using: APR 1.5.1, APR-UTIL 1.4.1 Architecture: 64-bit Server MPM: prefork threaded: no forked: yes (variable process count) Server compiled with.... -D APR_HAS_SENDFILE -D APR_HAS_MMAP -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled) -D APR_USE_SYSVSEM_SERIALIZE -D APR_USE_PTHREAD_SERIALIZE -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT -D APR_HAS_OTHER_CHILD -D AP_HAVE_RELIABLE_PIPED_LOGS -D DYNAMIC_MODULE_LIMIT=256 -D HTTPD_ROOT="/etc/httpd" -D SUEXEC_BIN="/usr/sbin/suexec" -D DEFAULT_PIDLOG="/var/run/httpd/httpd.pid" -D DEFAULT_SCOREBOARD="logs/apache_runtime_status" -D DEFAULT_ERRORLOG="logs/error_log" -D AP_TYPES_CONFIG_FILE="conf/mime.types" -D SERVER_CONFIG_FILE="conf/httpd.conf" 

Mac Details:

Apache configuration file: https://drive.google.com/file/d/0B3XXZfJyzJYsRUd6NW5NY3lON1U/view?usp=sharing

Apache version ( apachectl -V output)

 Server version: Apache/2.4.18 (Unix) Server built: Feb 20 2016 20:03:19 Server Module Magic Number: 20120211:52 Server loaded: APR 1.4.8, APR-UTIL 1.5.2 Compiled using: APR 1.4.8, APR-UTIL 1.5.2 Architecture: 64-bit Server MPM: prefork threaded: no forked: yes (variable process count) Server compiled with.... -D APR_HAS_SENDFILE -D APR_HAS_MMAP -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled) -D APR_USE_FLOCK_SERIALIZE -D APR_USE_PTHREAD_SERIALIZE -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT -D APR_HAS_OTHER_CHILD -D AP_HAVE_RELIABLE_PIPED_LOGS -D DYNAMIC_MODULE_LIMIT=256 -D HTTPD_ROOT="/usr" -D SUEXEC_BIN="/usr/bin/suexec" -D DEFAULT_PIDLOG="/private/var/run/httpd.pid" -D DEFAULT_SCOREBOARD="logs/apache_runtime_status" -D DEFAULT_ERRORLOG="logs/error_log" -D AP_TYPES_CONFIG_FILE="/private/etc/apache2/mime.types" -D SERVER_CONFIG_FILE="/private/etc/apache2/httpd.conf" 
+5
source share
3 answers

If you use the built-in mod_wsgi mode, which can happen because Apache controls the lifetime of processes and can process them if it believes that the process is no longer needed due to insufficient traffic.

You might be thinking, “But I'm using daemon mode, not built-in mode,” but you're not really the way your configuration is wrong. You have:

 <VirtualHost *:5010> ServerName localhost WSGIDaemonProcess entry user=kesiena group=staff threads=5 WSGIScriptAlias "/" "/Users/kesiena/Dropbox (MIT)/Sites/onetext/onetext.local.wsgi" <directory "/Users/kesiena/Dropbox (MIT)/Sites/onetext/app"> WSGIProcessGroup start WSGIApplicationGroup %{GLOBAL} WSGIScriptReloading On Order deny,allow Allow from all </directory> </virtualhost> 

This Directory block does not use a directory that matches the path in WSGIScriptAlias , so none of them apply.

Using:

 <VirtualHost *:5010> ServerName localhost WSGIDaemonProcess entry user=kesiena group=staff threads=5 WSGIScriptAlias "/" "/Users/kesiena/Dropbox (MIT)/Sites/onetext/onetext.local.wsgi" <directory "/Users/kesiena/Dropbox (MIT)/Sites/onetext"> WSGIProcessGroup start WSGIApplicationGroup %{GLOBAL} Order deny,allow Allow from all </directory> </virtualhost> 

The only reason it worked without this correspondence is because you have opened access to Apache to place files in this directory, having:

 <Directory "/Users/kesiena/Dropbox (MIT)/Sites"> Require all granted </Directory> 

It is bad practice to also set DocumentRoot as the parent directory where your application source code exists. With the way it is written, there is a risk that I may go into another port or VirtualHost and download all your application code.

Do not paste the application code into the directory specified in the DocumentRoot section.

By the way, even if you have a WSGI application running in daemon mode, Apache can still process workflows that it will use for proxy requests to mod_wsgi. Therefore, even if your very lengthy request continues to work during the WSGI application, it may fail as soon as it starts sending a response if the workflow was processed in the interim because it worked for too long.

You should definitely handle the long-running operation for the Celery task queue or equivalent.

+2
source

You may encounter forced closure of sockets, although with the time you gave it does not look too likely. For a project that I had on Azure, any connection that was idle for about 3 minutes was closed by the system. I believe that these locks were made before the server in the network routing, so it was not possible to disable them or increase the timeout.

+1
source

Minor issue.

Guess 1: Once I had a similar problem. Have you played a little with KeepAlive? Set it to 60 minutes or more and check if the problem persists. More details here https://httpd.apache.org/docs/2.4/de/mod/core.html

Guess 2: Can the Amazon "move" your computer in the background, which interrupts the database connection or flask, can not cope with the "unloading" and "loading" of the virtual machine?

0
source

Source: https://habr.com/ru/post/1257605/


All Articles