Retrieving data with asynchronous Python in sequential order

Question

Retrieving data with asynchronous Python in sequential order

I have a Python 2.7 program that retrieves data from websites and uploads the results to a database. It follows the model of the consumer manufacturer and is recorded using the streaming module.

Just for fun, I would like to rewrite this program using the new asyncio module (from 3.4), but I cannot figure out how to do this correctly.

The most important requirement is that the program must receive data from the same website in sequential order. For example, for the URL ' http://a-restaurant.com ' he should first get ' http://a-restaurant.com/menu/0 ', then http://a-restaurant.com/menu/1 ' , then http://a-restaurant.com/menu/2 ', ... If they are not loaded, so that the site completely stops the delivery of pages, and you need to start from 0.

However, another selection for another website ( http://another-restaurant.com ') can (and should) be launched simultaneously (other sites also have sequential limitation).

The streaming processing module is suitable for this, since I can create separate streams for each website, and in each stream it can wait until one page finishes loading before retrieving another.

Here is a roughly simplified snippet of code from the streaming version (Python 2.7):

class FetchThread(threading.Threading)
    def __init__(self, queue, url)
        self.queue = queue
        self.baseurl = url
    ...
    def run(self)
        # Get 10 menu pages in a sequantial order
        for food in range(10):
            url = self.baseurl + '/' + str(food)
            text = urllib2.urlopen(url).read()
            self.queue.put(text)
            ...
def main()
    queue = Queue.Queue()
    urls = ('http://a-restaurant.com/menu', 'http://another-restaurant.com/menu')
    for url in urls:
        fetcher = FetchThread(queue, url)
        fetcher.start()
        ...

And this is how I tried to do this using asyncio (in 3.4.1):

@asyncio.coroutine
def fetch(url):
    response = yield from aiohttp.request('GET', url)
    response = yield from response.read_and_close()
    return response.decode('utf-8')

@asyncio.coroutine
def print_page(url):
    page = yield from fetch(url)
    print(page)


l = []
urls = ('http://a-restaurant.com/menu', 'http://another-restaurant.com/menu')
for url in urls:
    for food in range(10):
        menu_url = url + '/' + str(food)
        l.append(print_page(menu_url))

loop.run_until_complete(asyncio.wait(l))

And he extracts and prints everything in a certain order. Well, I think the whole idea of these coroutines. Should I not use aiohttp and just get with urllib? But do they make discounts for the first restaurant, then block samples for other restaurants? I just think this is completely wrong? (This is just a test to try to retrieve things in sequential order. Until it got into the queue.)

+4

python asynchronous python-asyncio

mat Jun 16 '14 at 15:05

2

asyncio.Task threading.Thread asyncio. asyncio.async .

asyncio.gather - , asyncio.wait.

@asyncio.coroutine
def fetch(url):
    response = yield from aiohttp.request('GET', url)
    response = yield from response.read_and_close()
    return response.decode('utf-8')

@asyncio.coroutine
def print_page(url):
    page = yield from fetch(url)
    print(page)

@asyncio.coroutine
def process_restaurant(url):
    for food in range(10):
        menu_url = url + '/' + str(food)
        yield from print_page(menu_url)

urls = ('http://a-restaurant.com/menu', 'http://another-restaurant.com/menu')
coros = []
for url in urls:
    coros.append(asyncio.Task(process_restaurant(url)))

loop.run_until_complete(asyncio.gather(*coros))

+2

Andrew Svetlov 17 . '14 8:46

dano · Accepted Answer · 2014-06-16T15:19:19+0000

, . stdout, .

, , . :

@asyncio.coroutine
def fetch(url):
    response = yield from aiohttp.request('GET', url)
    response = yield from response.read_and_close()
    return response.decode('utf-8')

@asyncio.coroutine
def print_page(url):
    page = yield from fetch(url)
    print(page)

@syncio.coroutine
def print_pages_sequential(url, num_pages):
    for food in range(num_pages):
        menu_url = url + '/' + str(food)
        yield from print_page(menu_url)

l = [print_pages_sequential('http://a-restaurant.com/menu', 10)]

conc_url = 'http://another-restaurant.com/menu'
for food in range(10):
    menu_url = conc_url + '/' + str(food)
    l.append(print_page(menu_url))

loop.run_until_complete(asyncio.wait(l))

, , , . , yield from print_page print_pages_sequential , print_page , , , (, print_page, l).

, " " , , "a-restaurant" , - " " ".

Edit:

, :

l = []
urls = ["http://a-restaurant.com/menu", "http://another-restaurant.com/menu"]
for url in urls:
    menu_url = url + '/' + str(food)
    l.append(print_page_sequential(menu_url, 10))

loop.run_until_complete(asyncio.wait(l))

Retrieving data with asynchronous Python in sequential order

More articles: