Flask and / or Tornado - processing a time-consuming call to an external web service

I have a flash application that connects to the given URL for external services (with different, but usually long answers) and searches for some things there. After that, some heavy processor operations on the extracted data. It will take some time.

My problem: the answer from the outside may take some time. There is nothing you can do about it, but it becomes a big problem when you have several requests at once - a request for a flask to external services blocks the flow, and the rest wait.

Obvious waste of time and killing applications.

I heard about this asynchronous library called Tornado. And there are my questions:

  • Does this mean that it can handle several reqests and just call a callback right after answering from the outside?
  • Can I achieve this using my current flash application (maybe not because of WSGI, I think?), Or maybe I need to rewrite the whole application in Tornado?
  • How about these heavy CPU operations - will this block my thread? It's a good idea for load balancing anyway, but I'm curious how Tornado handles this.
  • Possible traps, gotchas?
+6
source share
2 answers

The web server built into the flask is not intended for use in production, precisely for the reasons you list, it is single-threaded and easily linked if any request blocks for a non-trivial amount of time. The flag documentation contains several deployment options in a production environment ; mod_wsgi , gunicorn , uSWGI etc. All of these deployment options provide mechanisms for handling concurrency either through threads, processes, or non-blocking I / O. However, note that if you are performing processor-bound operations, the only option that will give true concurrency is to use multiple processes.

If you want to use tornado , you need to rewrite the application in tornado style. Since its architecture is based on explicit asynchronous I / O, you cannot use its asynchronous functions if you deploy it as a WSGI application. The tornado style basically means using non-blocking APIs for all I / O and using subprocesses to handle any lengthy CPU-bound operations. The tornado documentation describes how to make asynchronous I / O calls, but here's a basic description of how this works:

 from tornado import gen @gen.coroutine def fetch_coroutine(url): http_client = AsyncHTTPClient() response = yield http_client.fetch(url) return response.body 

The response = yield http_client.fetch(curl) is actually asynchronous; it will return control to the tornado event loop when the request begins, and resume again after receiving a response. This allows you to run multiple asynchronous HTTP requests at the same time, all in one thread. Note that everything you do inside fetch_coroutine , not asynchronous I / O, blocks the event loop, and no other requests can be processed while this code is executing.

To handle long-term processor-bound operations, you need to send work to a subprocess to avoid blocking the event loop. For Python, this usually means using multiprocessing or concurrent.futures . I would consider this question for more information on how best to integrate these libraries with tornado . Please note that you do not want to support the process pool more than the number of processors that you have in the system, so consider how many simultaneous operations related to the processor you plan to start at any time when you figure out how to scale it beyond the limits of one machine.

The tornado documentation contains a section on launching the load balancer . They recommend using NGINX for this purpose.

+5
source

Tornado seems more appropriate for this task than Flask. The subclass Tornado.web.RequestHandler , executed on the tornado.ioloop instance, should give you processing without blocking requests. I expect it to look something like this.

 import tornado import tornado.web import tornado.ioloop import json class handler(tornado.web.RequestHandler): def post(self): self.write(json.dumps({'aaa':'bbbbb'})) if __name__ == '__main__': app = tornado.web.Application([('/', handler)]) app.listen(80, address='0.0.0.0') loop = tornado.ioloop.IOLoop.instance() loop.start() 

if you want your mail handler to be asynchronous, you can decorate it with tornado.gen.coroutine using the "AsyncHTTPClient or grequests" attributes. This will give you non-blocking requests. you could also put your calculations in a coroutine, although I'm not quite sure.

+1
source

Source: https://habr.com/ru/post/976078/


All Articles