Making multiple API calls in parallel using Python (IPython)

Question

Making multiple API calls in parallel using Python (IPython)

I work with Python (IPython and Canopy) and the RESTful content API on my local machine (Mac).

I have an array of 3000 unique identifiers to retrieve data from the API and can only call the API with one ID at a time.

I was hoping somehow to make 3 sets of 1000 calls in parallel to speed up the process.

What is the best way to do this?

Thanks in advance for your help!

+5

python api parallel-processing

user7289 Jun 07 '13 at 11:03

source share

2 answers

I think you should take a look at the "multiprocessor" Python module. This describes what I think you are trying to do: how to speed up API requests?

0

Partiban ramasamy Mar 27 '19 at 15:51

source share

minrk · Accepted Answer · 2013-06-07T20:24:15+0000

Without additional information about what you are doing in particular, it’s hard to say for sure, but a simple stream approach can make sense.

Assuming you have a simple function that processes a single ID:

import requests url_t = "http://localhost:8000/records/%i" def process_id(id): """process a single ID""" # fetch the data r = requests.get(url_t % id) # parse the JSON reply data = r.json() # and update some data with PUT requests.put(url_t % id, data=data) return data

You can expand this into a simple function that processes a series of identifiers:

 def process_range(id_range, store=None): """process a number of ids, storing the results in a dict""" if store is None: store = {} for id in id_range: store[id] = process_id(id) return store

and finally, you can quite easily map sub-bands to streams so that the number of simultaneous requests is multiple:

 from threading import Thread def threaded_process_range(nthreads, id_range): """process the id range in a specified number of threads""" store = {} threads = [] # create the threads for i in range(nthreads): ids = id_range[i::nthreads] t = Thread(target=process_range, args=(ids,store)) threads.append(t) # start the threads [ t.start() for t in threads ] # wait for the threads to finish [ t.join() for t in threads ] return store

Full example on an IPython laptop: http://nbviewer.ipython.org/5732094

If your individual tasks take a wider range of time, you can use ThreadPool , which will assign tasks one at a time (often slower if individual tasks are very small, but guarantee a better balance in heterogeneous cases).

Making multiple API calls in parallel using Python (IPython)

More articles: