I used scrapy to get some book data on amazon.com. I just want the name, author and book prices. I want to do this in categories, for example, books on computer science.
consider a piece of code (some Amazon page):
<div class="a-row">
::before
<div class="a-column a-span7">
<div class="a-row a-spacing-none">...</div>
<div class="a-row a-spacing-none">...</div>
<hr class="a-divider-normal s-result-divier">
<div class="a-row a-spacing-none">...</div>
<div class="a-row a-spacing-none">...</div>
<div class="a-row a-spacing-none">...</div>
</div>
<div class="a-column a-span5 a-span-last"></div>
::after
</div>
So, I tried to get the div elements inside the div [@ class = "a-column a-span7"]. But only the first two div elements are returned. The command I used was:
>>> books = response.selector.xpath ('.//div[@class="a-fixed-left-grid-col a-col-right"]')
>>> abook = books[0].xpath('.//div[@class="a-row"]')
>>> prices = abook.xpath ('.//div[@class="a-column a-span7"]')
>>> len (prices.xpath('div'))
2
The above code does the following:
- Get all div elements containing book information on a specific page
- Get the first “book” and get a div that contains the prices of the book.
- Get div with class a-column a-span7 '
- Here's the problem: I don't understand why the number of div elements inside a div with a-column a-span7 'class
div <hr> , , scrapy stop on tag <hr> . , :
>>> abook.xpath ('div')
[<Selector xpath='div' data=u'<div class="a-column a-span7"><div class'>, <Selector xpath='div' data=u'<div class="a-column a-span5 a-span-last'>]
, . , .
: stackref. use < <hr> , .