In my project, I use the multiprocessing
class to do tasks in parallel. Instead, I want to use threading
, as it has better performance (my tasks are related to TCP / IP, not CPU or I / O).
multiprocessing
has great features like Pool.imap_unordered
and Pool.map_async
, which does not exist in the threading
class.
What is the correct way to convert my code to use threading
instead? The documentation introduces the multiprocessing.dummy
class, which is a wrapper for the threading
class. However, this causes a lot of errors (at least on python 2.7.3):
pool = multiprocessing.Pool(processes) File "C:\python27\lib\multiprocessing\dummy\__init__.py", line 150, in Pool return ThreadPool(processes, initializer, initargs) File "C:\python27\lib\multiprocessing\pool.py", line 685, in __init__ Pool.__init__(self, processes, initializer, initargs) File "C:\python27\lib\multiprocessing\pool.py", line 136, in __init__ self._repopulate_pool() File "C:\python27\lib\multiprocessing\pool.py", line 199, in _repopulate_pool w.start() File "C:\python27\lib\multiprocessing\dummy\__init__.py", line 73, in start self._parent._children[self] = None AttributeError: '_DummyThread' object has no attribute '_children'
Edit: What actually happens is that I have a GUI that launches another thread (to prevent the GUI from freezing from gettint). This thread performs a specific search function that has a ThreadPool
that fails.
Edit 2: The fix is fixed and will be included in future releases. It is clearly seen that the starch is fixed!
import urllib2, htmllib, formatter import multiprocessing.dummy as multiprocessing import xml.dom.minidom import os import string, random from urlparse import parse_qs, urlparse from useful_util import retry import config from logger import log class LinksExtractor(htmllib.HTMLParser): def __init__(self, formatter): htmllib.HTMLParser.__init__(self, formatter) self.links = [] self.ignoredSites = config.WebParser_ignoredSites def start_a(self, attrs): for attr in attrs: if attr[0] == "href" and attr[1].endswith(".mp3"): if not filter(lambda x: (x in attr[1]), self.ignoredSites): self.links.append(attr[1]) def get_links(self): return self.links def GetLinks(url, returnMetaUrlObj=False): ''' Function gather links from a url. @param url: Url Address. @param returnMetaUrlObj: If true, returns a MetaUrl Object list. Else, returns a string list. Default is False. @return links: Look up. ''' htmlparser = LinksExtractor(formatter.NullFormatter()) try: data = urllib2.urlopen(url) except (urllib2.HTTPError, urllib2.URLError) as e: log.error(e) return [] htmlparser.feed(data.read()) htmlparser.close() links = list(set(htmlparser.get_links())) if returnMetaUrlObj: links = map(MetaUrl, links) return links def isAscii(s): "Function checks is the string is ascii." try: s.decode('ascii') except (UnicodeEncodeError, UnicodeDecodeError): return False return True @retry(Exception, logger=log) def parse(song, source): ''' Function parses the source search page and returns the .mp3 links in it. @param song: Search string. @param source: Search website source. Value can be dilandau, mp3skull, youtube, seekasong. @return links: .mp3 url links. ''' source = source.lower() if source == "dilandau": return parse_dilandau(song) elif source == "mp3skull": return parse_Mp3skull(song) elif source == "SeekASong": return parse_SeekASong(song) elif source == "youtube": return parse_Youtube(song) log.error('no source "%s". (from parse function in WebParser)') return [] def parse_dilandau(song, pages=1): "Function connects to Dilandau.eu and returns the .mp3 links in it" if not isAscii(song):