Using urllib3 or queries and Celery

Question

Using urllib3 or queries and Celery

We have a script that periodically downloads documents from different sources. I'm going to move this to celery, but by doing this I wanted to use pooling at the same time, but I was not sure how to do it.

My current thought is to do something like this with Requests:

import celery import requests s = requests.session() @celery.task(retry=2) def get_doc(url): doc = s.get(url) #do stuff with doc

But I am worried that the connections will remain vague.

I really only need connections to stay open while I process new documents.

So something like this is possible:

 import celery import requests def get_all_docs() docs = Doc.objects.filter(some_filter=True) s = requests.session() for doc in docs: t=get_doc.delay(doc.url, s) @celery.task(retry=2) def get_doc(url): doc = s.get(url) #do stuff with doc

However, in this case, I am not sure that the connection sessions will be saved in different instances or if the requests will create new connections after the etching / spilling is completed.

Finally, I could try experimental support for task decorators in the class method, so something like this:

 import celery import requests class GetDoc(object): def __init__(self): self.s = requests.session() @celery.task(retry=2) def get_doc(url): doc = self.s.get(url) #do stuff with doc

The latter looks like the best approach, and I'm going to test it; however, I was wondering if someone here had already done something similar to this, or if not, one of you reading this might be a better fit than one of the above methods.

+4

python django python-requests celery urllib3

James r Sep 7 '12 at 17:00

source share

No one has answered this question yet.

See related questions:

622

What are the differences between urllib, urllib2 and query module?

517

JSON message using Python requests

484

ImportError: not a single module with the requested queries