I get and cache (for performance) a lot of URLs with something like:
import requests
import requests_cache
from multiprocessing.pool import ThreadPool
urls = ['http://www.google.com', ...]
with requests_cache.enabled():
responses = ThreadPool(100).map(requests.get, urls)
However, I get a lot of errors for:
sqlite3.OperationalError: database is locked
Clearly, too many threads are accessing the cache at the same time.
So, it requests_cachesupports some kind of transaction, so that recording only happens when all threads are complete? For instance.
with requests_cache.enabled():
with requests_cache.transaction():
responses = ThreadPool(100).map(requests.get, urls)
source
share