What is the best way to send multiple HTTP requests in Python 3?

The idea is simple: I need to send multiple HTTP requests in parallel.

I decided to use the requests-futures library, which basically spawns multiple threads.

Now I have about 200 requests, and it's still pretty slow (takes about 12 seconds on my laptop). I also use a callback to parse the json response (as suggested in the library documentation). In addition, is there a thumb rule to determine the optimal number of threads depending on the number of requests, is there?

Basically, I was wondering if I could speed up these queries.

+6
source share
1 answer

Since you are using python 3.3, I recommend a stdlib solution that you will not find in the @ njzk2 linked thread: concurrent.futures .

This is a higher level of interaction than just working with threading or multiprocessing primitives. You get the Executor interface for pool processing and asynchronous reporting.

There is an example in the docs that is mostly directly applicable to your situation, so I'll just post it here:

 import concurrent.futures import urllib.request URLS = #[some list of urls] # Retrieve a single page and report the url and contents def load_url(url, timeout): conn = urllib.request.urlopen(url, timeout=timeout) return conn.readall() # We can use a with statement to ensure threads are cleaned up promptly with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor: # Start the load operations and mark each future with its URL future_to_url = {executor.submit(load_url, url, 60): url for url in URLS} for future in concurrent.futures.as_completed(future_to_url): url = future_to_url[future] try: data = future.result() # do json processing here except Exception as exc: print('%r generated an exception: %s' % (url, exc)) else: print('%r page is %d bytes' % (url, len(data))) 

You can replace urllib.request calls with requests calls if you want. For obvious reasons, I like requests more.

The API looks something like this: create a bunch of Future objects that represent the asynchronous execution of your function. Then you use concurrent.futures.as_completed to give you an iterator over Future instances. He will give them as they are completed.

Regarding your question:

In addition, is there a rule for determining the optimal number of threads depending on the number of requests, is there?

Rule No. It depends on too many things, including the speed of your internet connection. I will say that this does not depend on the number of requests that you have, more on the equipment on which you work.

Fortunately, it’s pretty easy to max_workers max_workers kwarg and check for yourself. Start with 5 or 10 threads, increase them in steps of 5. At some point you will probably notice buoyancy of performance, and then start to decrease, since the overhead of adding additional threads exceeds the maximum payoff of increased parallelization (which is a word).

+6
source

Source: https://habr.com/ru/post/958516/


All Articles