Python 2.5 - multi-threaded loop

I have a code snippet:

for url in get_lines(file): visit(url, timeout=timeout) 

It gets the urls from the file and visits it (by urllib2) in a for loop.

Is it possible to do this in multiple threads? For example, 10 visits at a time.


I tried:

 for url in get_lines(file): Thread(target=visit, args=(url,), kwargs={"timeout": timeout}).start() 

But this does not work - no effect, URLs are visited normally.


Simplified version of the function:

 def visit(url, proxy_addr=None, timeout=30): (...) request = urllib2.Request(url) response = urllib2.urlopen(request) return response.read() 
+4
source share
2 answers

To extend the answer to senderle, you can use the Pool class on a multiprocessor system to make this easy:

 from multiprocessing import Pool pool = Pool(processes=5) pages = pool.map(visit, get_lines(file)) 

When the map function returns, then the β€œpages” will be a list of URL contents. You can configure the number of processes for everything that suits your system.

+4
source

I suspect you have encountered Global Interpreter Lock . Basically, threading in python cannot achieve concurrency, which seems to be your goal. You need to use multiprocessing instead.

multiprocessing has approximately the same interface with threading , but has several features. I believe that your visit function, as described above, should work correctly, because it is written in a functional style without side effects.

In multiprocessing the Process class is the equivalent of the Thread class in threading . It has all the same methods, so this is a replacement in this case. (Although, I suppose, you could use pool as JoeZuntz , but first I would check with the main class Process to see if it fixes the problem.)

+1
source

Source: https://habr.com/ru/post/984202/


All Articles