Using Python request.get to parse html code that doesn't load right away

I am trying to write a Python script that will periodically check the website to see if the item is accessible. In the past, I used request.get, lxml.html, and xpath to automate web searches. In the case of this URL ( http://www.anthropologie.com/anthro/product/4120200892474.jsp?cm_vc=SEARCH_RESULTS#/ ) and others on the same site, my code did not work.

import requests
from lxml import html
page = requests.get("http://www.anthropologie.com/anthro/product/4120200892474.jsp?cm_vc=SEARCH_RESULTS#/")
tree = html.fromstring(page.text)
html_element = tree.xpath(".//div[@class='product-soldout ng-scope']")

at this point, html_element should be a list of elements (I think in this case only 1), but instead it is empty. I think this is due to the fact that the website does not load everything at once, so when request.get () comes out and grabs it, it captures only the first part. So my questions are 1: Am I correct in my assessment of the problem? and also 2: If yes, is there a way to make request.get () wait before returning html or perhaps another route to get the whole page.

thanks

Edit: thanks to both answers. I used Selenium and got a script work.

+3
source share
2 answers

You are not right in your assessment of the problem.

, </html> . , .

requests.text ; , .

, HTML; JavaScript. , HTML. , JavaScript, .

. :

  • selenium , .
  • , JavaScript, Python.
  • JavaScript- DOM.
+6

javascript , , html, html, , javascript, selenium phantomjs , html:

from selenium import webdriver

browser = webdriver.PhantomJS()
browser.get("http://www.anthropologie.eu/anthro/index.jsp#/")
html = browser.page_source
print(html)
+3

Source: https://habr.com/ru/post/1624142/


All Articles