Poor Django / wsgi work

I am running a django application with nginx and uwsgi. This is how I run uwsgi:

sudo uwsgi -b 25000 --chdir=/www/python/apps/pyapp --module=wsgi:application --env DJANGO_SETTINGS_MODULE=settings --socket=/tmp/pyapp.socket --cheaper=8 --processes=16 --harakiri=10 --max-requests=5000 --vacuum --master --pidfile=/tmp/pyapp-master.pid --uid=220 --gid=499 

& Nginx configurations:

 server { listen 80; server_name test.com root /www/python/apps/pyapp/; access_log /var/log/nginx/test.com.access.log; error_log /var/log/nginx/test.com.error.log; # https://docs.djangoproject.com/en/dev/howto/static-files/#serving-static-files-in-production location /static/ { alias /www/python/apps/pyapp/static/; expires 30d; } location /media/ { alias /www/python/apps/pyapp/media/; expires 30d; } location / { uwsgi_pass unix:///tmp/pyapp.socket; include uwsgi_params; proxy_read_timeout 120; } # what to serve if upstream is not available or crashes #error_page 500 502 503 504 /media/50x.html; } 

That is the problem. When you do "ab" (ApacheBenchmark) on the server, I get the following results:

nginx version: nginx version: nginx / 1.2.6

uwsgi version: 1.4.5

 Server Software: nginx/1.0.15 Server Hostname: pycms.com Server Port: 80 Document Path: /api/nodes/mostviewed/8/?format=json Document Length: 8696 bytes Concurrency Level: 100 Time taken for tests: 41.232 seconds Complete requests: 1000 Failed requests: 0 Write errors: 0 Total transferred: 8866000 bytes HTML transferred: 8696000 bytes Requests per second: 24.25 [#/sec] (mean) Time per request: 4123.216 [ms] (mean) Time per request: 41.232 [ms] (mean, across all concurrent requests) Transfer rate: 209.99 [Kbytes/sec] received 

When operating at 500 concurrency

 oncurrency Level: 500 Time taken for tests: 2.175 seconds Complete requests: 1000 Failed requests: 50 (Connect: 0, Receive: 0, Length: 50, Exceptions: 0) Write errors: 0 Non-2xx responses: 950 Total transferred: 629200 bytes HTML transferred: 476300 bytes Requests per second: 459.81 [#/sec] (mean) Time per request: 1087.416 [ms] (mean) Time per request: 2.175 [ms] (mean, across all concurrent requests) Transfer rate: 282.53 [Kbytes/sec] received 

As you can see ... all requests on the server fail with timeout errors or "Client prematurely disconnected" or:

 writev(): Broken pipe [proto/uwsgi.c line 124] during GET /api/nodes/mostviewed/9/?format=json 

Here is a little more about my application: Basically, this is a set of models that reflect MySQL tables that contain all the content. In the interface, I have a django-rest-framework that serves json client content.

I installed django-profiling and django debug toolbar to find out what is happening. About django-profiling here, what I get when starting a single request:

 Instance wide RAM usage Partition of a set of 147315 objects. Total size = 20779408 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 63960 43 5726288 28 5726288 28 str 1 36887 25 3131112 15 8857400 43 tuple 2 2495 2 1500392 7 10357792 50 dict (no owner) 3 615 0 1397160 7 11754952 57 dict of module 4 1371 1 1236432 6 12991384 63 type 5 9974 7 1196880 6 14188264 68 function 6 8974 6 1076880 5 15265144 73 types.CodeType 7 1371 1 1014408 5 16279552 78 dict of type 8 2684 2 340640 2 16620192 80 list 9 382 0 328912 2 16949104 82 dict of class <607 more rows. Type eg '_.more' to view.> CPU Time for this request 11068 function calls (10158 primitive calls) in 0.064 CPU seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.064 0.064 /usr/lib/python2.6/site-packages/django/views/generic/base.py:44(view) 1 0.000 0.000 0.064 0.064 /usr/lib/python2.6/site-packages/django/views/decorators/csrf.py:76(wrapped_view) 1 0.000 0.000 0.064 0.064 /usr/lib/python2.6/site-packages/rest_framework/views.py:359(dispatch) 1 0.000 0.000 0.064 0.064 /usr/lib/python2.6/site-packages/rest_framework/generics.py:144(get) 1 0.000 0.000 0.064 0.064 /usr/lib/python2.6/site-packages/rest_framework/mixins.py:46(list) 1 0.000 0.000 0.038 0.038 /usr/lib/python2.6/site-packages/rest_framework/serializers.py:348(data) 21/1 0.000 0.000 0.038 0.038 /usr/lib/python2.6/site-packages/rest_framework/serializers.py:273(to_native) 21/1 0.000 0.000 0.038 0.038 /usr/lib/python2.6/site-packages/rest_framework/serializers.py:190(convert_object) 11/1 0.000 0.000 0.036 0.036 /usr/lib/python2.6/site-packages/rest_framework/serializers.py:303(field_to_native) 13/11 0.000 0.000 0.033 0.003 /usr/lib/python2.6/site-packages/django/db/models/query.py:92(__iter__) 3/1 0.000 0.000 0.033 0.033 /usr/lib/python2.6/site-packages/django/db/models/query.py:77(__len__) 4 0.000 0.000 0.030 0.008 /usr/lib/python2.6/site-packages/django/db/models/sql/compiler.py:794(execute_sql) 1 0.000 0.000 0.021 0.021 /usr/lib/python2.6/site-packages/django/views/generic/list.py:33(paginate_queryset) 1 0.000 0.000 0.021 0.021 /usr/lib/python2.6/site-packages/django/core/paginator.py:35(page) 1 0.000 0.000 0.020 0.020 /usr/lib/python2.6/site-packages/django/core/paginator.py:20(validate_number) 3 0.000 0.000 0.020 0.007 /usr/lib/python2.6/site-packages/django/core/paginator.py:57(_get_num_pages) 4 0.000 0.000 0.020 0.005 /usr/lib/python2.6/site-packages/django/core/paginator.py:44(_get_count) 1 0.000 0.000 0.020 0.020 /usr/lib/python2.6/site-packages/django/db/models/query.py:340(count) 1 0.000 0.000 0.020 0.020 /usr/lib/python2.6/site-packages/django/db/models/sql/query.py:394(get_count) 1 0.000 0.000 0.020 0.020 /usr/lib/python2.6/site-packages/django/db/models/query.py:568(_prefetch_related_objects) 1 0.000 0.000 0.020 0.020 /usr/lib/python2.6/site-packages/django/db/models/query.py:1596(prefetch_related_objects) 4 0.000 0.000 0.020 0.005 /usr/lib/python2.6/site-packages/django/db/backends/util.py:36(execute) 1 0.000 0.000 0.020 0.020 /usr/lib/python2.6/site-packages/django/db/models/sql/query.py:340(get_aggregation) 5 0.000 0.000 0.020 0.004 /usr/lib64/python2.6/site-packages/MySQLdb/cursors.py:136(execute) 2 0.000 0.000 0.020 0.010 /usr/lib/python2.6/site-packages/django/db/models/query.py:1748(prefetch_one_level) 4 0.000 0.000 0.020 0.005 /usr/lib/python2.6/site-packages/django/db/backends/mysql/base.py:112(execute) 5 0.000 0.000 0.019 0.004 /usr/lib64/python2.6/site-packages/MySQLdb/cursors.py:316(_query) 60 0.000 0.000 0.018 0.000 /usr/lib/python2.6/site-packages/django/db/models/query.py:231(iterator) 5 0.012 0.002 0.015 0.003 /usr/lib64/python2.6/site-packages/MySQLdb/cursors.py:278(_do_query) 60 0.000 0.000 0.013 0.000 /usr/lib/python2.6/site-packages/django/db/models/sql/compiler.py:751(results_iter) 30 0.000 0.000 0.010 0.000 /usr/lib/python2.6/site-packages/django/db/models/manager.py:115(all) 50 0.000 0.000 0.009 0.000 /usr/lib/python2.6/site-packages/django/db/models/query.py:870(_clone) 51 0.001 0.000 0.009 0.000 /usr/lib/python2.6/site-packages/django/db/models/sql/query.py:235(clone) 4 0.000 0.000 0.009 0.002 /usr/lib/python2.6/site-packages/django/db/backends/__init__.py:302(cursor) 4 0.000 0.000 0.008 0.002 /usr/lib/python2.6/site-packages/django/db/backends/mysql/base.py:361(_cursor) 1 0.000 0.000 0.008 0.008 /usr/lib64/python2.6/site-packages/MySQLdb/__init__.py:78(Connect) 910/208 0.003 0.000 0.008 0.000 /usr/lib64/python2.6/copy.py:144(deepcopy) 22 0.000 0.000 0.007 0.000 /usr/lib/python2.6/site-packages/django/db/models/query.py:619(filter) 22 0.000 0.000 0.007 0.000 /usr/lib/python2.6/site-packages/django/db/models/query.py:633(_filter_or_exclude) 20 0.000 0.000 0.005 0.000 /usr/lib/python2.6/site-packages/django/db/models/fields/related.py:560(get_query_set) 1 0.000 0.000 0.005 0.005 /usr/lib64/python2.6/site-packages/MySQLdb/connections.py:8() 

.. etc.

However, the django-debug toolbar shows the following:

 Resource Usage Resource Value User CPU time 149.977 msec System CPU time 119.982 msec Total CPU time 269.959 msec Elapsed time 326.291 msec Context switches 11 voluntary, 40 involuntary and 5 queries in 27.1 ms 

The problem is that the "top" shows that the average load is growing rapidly, and the apache test, which I ran on both the local server and the remote machine on the network, shows that I do not serve many requests / second. What is the problem? this is as far as I could achieve when profiling the code, so it would be helpful if someone could indicate what I'm doing here.

Edit (02/23/2013): adding more details based on Andrew Alcock's answer: Questions requiring my attention / answer, (3) (3) I performed “show global variables” in MySQL and found out that there were 151 for MySQL configurations for the max_connections parameter, which is more than enough to serve the workers that I run for uwsgi.

(3) (4) (2) The only query that I profile is the hardest. It executes 4 requests according to the django-debug-toolbar. It happens that all requests are executed in: 3.71, 2.83, 0.88, 4.84 ms, respectively.

(4) Here, you mean paging memory? if so, how could I say?

(5) For 16 workers, 100 concurrency rate, 1000 requests average load load up to ~ 12 I conducted tests on different numbers of workers (concurrency level is 100):

  • 1 worker, average load ~ 1.85, 19 reqs / second, Order time: 5229.520, 0 non-2xx
  • 2 working, average load ~ 1.5, 19 reqs / second, Time per request: 516.520, 0 non-2xx
  • 4 employees, average load ~ 3, 16 reqs / second, Time per request: 5929.921, 0 non-2xx
  • 8 workers, average load ~ 5, 18 reqs / second, Time per request: 5301.458, 0 non-2xx
  • 16 workers, average load ~ 19, 15 reqs / second, Order time: 6384.720, 0 non-2xx

As you can see, the more workers we have, the greater the load on our system. I see in the dawon uwsgi log the response time in milliseconds increases as I increase the number of workers.

On 16 workers running 500 concurrency requests, uwsgi triggers errors:

  writev(): Broken pipe [proto/uwsgi.c line 124] 

The load increases to ~ 10. and the tests do not take much time, because non-2xx answers are 923 out of 1000, so the answer here is pretty quick, because it is almost empty. This is also the answer to your question number 4 in the summary.

Assuming that what I came across is the I / O and network latency of the OS, what is the recommended action to scale it? new equipment? larger server?

thank

+46
python django django-rest-framework uwsgi
Feb 19 '13 at 16:21
source share
3 answers

EDIT 1 . See that you have 1 virtual core by adding a comment at all relevant points.

EDIT 2 More information from Maverick, so I eliminate ideas that exclude and develop confirmed problems.

EDIT 3 Fills out more detailed information about the uwsgi queue parameters and scaling options. Improved grammar.

EDIT 4 Maverick Updates and Minor Improvements

Comments are too small, here are some thoughts:

  • The average load is basically how many processes are running or waiting for the attention of the CPU. For a completely loaded system with 1 central processor, the average load should be 1.0; for a quad-core system, this should be 4.0. The moment you run a web test, streaming rockets, and you have a lot of processes waiting for a processor. If the average load does not exceed the number of CPU cores with a substantial margin, this is not a problem
  • The first value of the “Request Time” 4s correlates with the length of the request queue - 1000 requests dropped on Django almost instantly and took an average of 4 seconds to serve, about 3.4 of which were waiting in the queue. This is due to a very strong mismatch between the number of requests (100) and the number of processors (16), as a result of which 84 of the requests were waiting for the processor at any time.
  • Running at a concurrency of 100, tests take 41 seconds at 24 queries per second. You have 16 processes (threads), so each request is processed for about 700 ms. Given your type of transaction, this is a long time for each request. This may be due to the fact that:

    • The processor cost for each request is high in Django (which is unlikely, given the low CPU value from the debug toolbar)
    • OS is a multitasking task (especially if the average load is above 4-8), and latency comes down to too many processes.
    • There are not enough DB connections serving 16 processes, so the processes wait until they are available. Do you have at least one connection for each process?
    • There is significant latency around the database, either :

      • Dozens of small requests, each of which takes, say, 10 ms, most of which are network overhead. If so, can you introduce caching or reduce SQL queries by a smaller number. Or strike>
      • One or more requests take 100 ms. To verify this, perform profiling in the database. If so, you need to optimize this query.
  • The distribution between the price of the central processor and the user in the system is unusually large, although the overall processor is low. This means that most of the work in Django is related to the kernel, such as a network or disk. In this case, this can be network costs (for example, receiving and sending HTTP requests, as well as receiving and sending requests to the database). Sometimes it will be high due to paging. If there is no paging, then you probably won't have to worry about it.

  • You set the processes to 16, but have an average load level (how high you did not indicate) . Ideally, you should always have at least one process waiting for the processor (so that the processors do not spin without movement). The processes here are not CPU related, but have a significant delay, so you need more processes than cores. How much more? Try running uwsgi with a different number of processors (1, 2, 4, 8, 12, 16, 24, etc.), until you have better bandwidth. If you change the latency of the middle process, you will need to configure it again.
  • The 500 concurrency level is definitely a problem , but is it a client or server? The report said that 50 (out of 100) had the wrong content length, which implies a server problem. Non-2xx seems to be pointing there. Is it possible to capture non-2xx responses for debugging - a stack trace or a specific error message would be incredibly useful (EDIT) and caused by the uwsgi request queue, which is set to the default value of 100.

So in short:

enter image description here

  • Django seems beautiful
  • Inconsistency between concurrency load test (100 or 500) compared to processes (16): you cause too many concurrent requests to the system for the number of processes being processed. When you exceed the number of processes, all that happens is that you lengthen the queue of HTTP requests on the web server.
  • There is a big delay, so either

    • The mismatch between processes (16) and CPU cores (1): if the average load is> 3, then there are probably too many processes. Try again with fewer processes.

      • Average load> 2 → try 8 processes
      • Average load> 4 → try 4 processes
      • Average load> 8 → try 2 processes
    • If the average load value is <3, it can be in the database, so you need to profile the database to see if there are many small queries (additively causing a delay) or one or two SQL statements

  • Without writing down the failed answer, I cannot say about failures at 500 concurrency

Idea development

Average load values> 10 on a single powder machine are really unpleasant and (as you noticed) lead to a lot of task switching and normal slow behavior. I personally do not remember how a machine with an average load of 19 (what you have for 16 processes) is congratulations for being so high;)

Database performance is great, so I would immediately make it clear that now.

Paging To answer the question of how to see paging, you can detect OS swap in several ways. For example, in the upper part, the header has inserts and outputs (see the last line):

  Processes: 170 total, 3 running, 4 stuck, 163 sleeping, 927 threads 15:06:31
 Load Avg: 0.90, 1.19, 1.94 CPU usage: 1.37% user, 2.97% sys, 95.65% idle SharedLibs: 144M resident, 0B data, 24M linkedit.
 MemRegions: 31726 total, 2541M resident, 120M private, 817M shared.  PhysMem: 1420M wired, 3548M active, 1703M inactive, 6671M used, 1514M free.
 VM: 392G vsize, 1286M framework vsize, 1534241 (0) pageins, 0 (0) pageouts.  Networks: packets: 789684 / 288M in, 912863 / 482M out.  Disks: 739807 / 15G read, 996745 / 24G written. 

The number of processes . In your current configuration, the number of processes is too large. Scale the number of processes to 2 . We can add this value later, depending on the change in the additional load from this server.

Apache Benchmark test location . The average load value of 1.85 for one process tells me that you are using the load generator on the same computer as uwsgi, is that correct?

If so, you really need to run it from another machine, otherwise the test runs will not represent the actual load - you take the memory and processor from the web processes for use in the load generator. In addition, 100 or 500 load generators usually power your server in a way that doesn't happen in real life. In fact, this may be the reason that the entire test failed.

The location of the database . The average load for one process also assumes that you are using the database on the same machine as the web processes - is this correct?

If I relate to the database correctly, then the first and best way to start scaling is to move the database to another machine. We do this for two reasons:

  • The database server requires a different hardware profile from node processing:

    • Disk: the database requires a lot of fast, backup, backup disk, and for processing node you need only a basic disk
    • Processor: for node processing, you need the fastest processor that you can afford, while a DB machine can do without (often its performance is protected on disk and RAM).
    • RAM: a DB machine usually needs as much RAM as possible (and the fastest database has all its data in RAM), while many processing nodes require much less (you need about 20 MB per process - very little
    • Scaling: Atomic databases are best scaled due to the presence of monster machines with many processors, while the web level (stateless) can be scaled by connecting many identical small boxes.
  • Affinity for the processor: it is better that the processor has an average load of 1.0 and the processes have an affinity for one core. This maximizes the use of the processor cache and minimizes the overhead of task switching. By separating the DB and the processing nodes, you perform this binding in the HW.

500 concurrency with exceptions . The request queue in the diagram above is no more than 100 - if uwsgi receives the request when the queue is full, the request is rejected with a 5xx error. I think this happened in your 500 concurrency test experiment - basically the queue was filled with the first 100 or so threads, then the remaining 400 threads returned the remaining 900 requests and received immediate 5xx errors.

To process 500 requests per second, you need to provide two things:

  • The size of the request queue is configured to process the package: use the --listen argument for uwsgi
  • The system can handle throughput over 500 requests per second if 500 is normal, or a bit lower if 500 is a peak. See scaling notes below.

I assume uwsgi has a queue set to a smaller number to better deal with DDoS attacks; if they are under enormous load, most requests immediately fail with almost no processing, allowing the unit as a whole to still respond to administrators.

General system scaling guidelines

The most important thing for you is probably to maximize throughput . Another possible need is to minimize response time, but I will not discuss this here. At maximum throughput, you are trying to maximize the system, not the individual components; some local abbreviations can improve the overall system throughput (for example, the change that occurs to add latency to the web tier to improve database performance is a net gain).

Depending on the specifics:

  • Move the database to a separate computer . After that, profile the database during the load test by running top and your favorite MySQL monitoring tool. You must have a profile. Moving the database to a separate computer will lead to some additional delay (a few ms) for each request, so expect to slightly increase the number of processes at the web page level to maintain the same throughput.
  • Ensure that the uswgi request uswgi is large enough to process the traffic packet using the --listen argument. This should be several times the maximum request per second that your system can handle.
  • At the web / app level: The balance of the number of processes with the number of processor cores and the inherent latency in this process. , , . , , . , :

    • 0% , 1 ​​
    • 50% (.. ), 2 ​​
    • 67%, 3 .
  • top , , 90% ( ), 1.0. , . , - ,

  • - -. ( ), , / , ( ). uwsgi, Łukasz Mierzwa
+125
23 . '13 6:53
source share

, ( , 5-10). . uWSGI, / ( ). .

500 ​​, ( 100 ). , , 't , ( backlog) , linux , .

, ~ 42 , 1000 /42 = ~ 23 ( db concurrency ). , 500 , 500/23 = 21 ( , , 40). 16, , .

EDIT: concurrency - 21 500 , 500 . 500 , 500 . , "Gevent" uWSGI.

PS. uWSGI ( " " "FastRouter" ). , , node, FastRouter . . backend AWS. , , .

+6
24 . '13 9:44
source share

r/s , " ", IO, .

, ( ) .

, r/s, , , , , .

+1
24 . '13 6:47
source share



All Articles