Using Python request.get to parse html code that doesn't load right away

Question

Using Python request.get to parse html code that doesn't load right away

I am trying to write a Python script that will periodically check the website to see if the item is accessible. In the past, I used request.get, lxml.html, and xpath to automate web searches. In the case of this URL ( http://www.anthropologie.com/anthro/product/4120200892474.jsp?cm_vc=SEARCH_RESULTS#/ ) and others on the same site, my code did not work.

import requests
from lxml import html
page = requests.get("http://www.anthropologie.com/anthro/product/4120200892474.jsp?cm_vc=SEARCH_RESULTS#/")
tree = html.fromstring(page.text)
html_element = tree.xpath(".//div[@class='product-soldout ng-scope']")

at this point, html_element should be a list of elements (I think in this case only 1), but instead it is empty. I think this is due to the fact that the website does not load everything at once, so when request.get () comes out and grabs it, it captures only the first part. So my questions are 1: Am I correct in my assessment of the problem? and also 2: If yes, is there a way to make request.get () wait before returning html or perhaps another route to get the whole page.

thanks

Edit: thanks to both answers. I used Selenium and got a script work.

+3

python html web-scraping python-requests

brophi May 01, '15 at 22:20

source share

2 answers

javascript , , html, html, , javascript, selenium phantomjs , html:

from selenium import webdriver

browser = webdriver.PhantomJS()
browser.get("http://www.anthropologie.eu/anthro/index.jsp#/")
html = browser.page_source
print(html)

+3

Padraic Cunningham 01 '15 22:26

abarnert · Accepted Answer · 2015-05-01T22:27:58+0000

You are not right in your assessment of the problem.

, </html> . , .

requests.text ; , .

, HTML; JavaScript. , HTML. , JavaScript, .

. :

selenium , .
, JavaScript, Python.
JavaScript- DOM.

Using Python request.get to parse html code that doesn't load right away

More articles: