Inability to get trading price using the "HTML-Requests" library

Question

Inability to get trading price using the "HTML-Requests" library

I wrote a script in python to get the price of the last deal from a javascript processed web page. I can get the content. If I decided to go with selenium . My goal here is not to use any browser simulator like selenium or something, because the latest version of Requests-HTML should be able to parse the encrypted javascript content. However, I cannot advance successfully. When I run the script, I get the following error. Any help on this would be greatly appreciated.

Website Address: webpage_link

script I tried:

 import requests_html with requests_html.HTMLSession() as session: r = session.get('https://www.gdax.com/trade/LTC-EUR') js = r.html.render() item = js.find('.MarketInfo_market-num_1lAXs',first=True).text print(item)

This is the full trace:

 Exception in callback NavigatorWatcher.waitForNavigation.<locals>.watchdog_cb(<Task finishe...> result=None>) at C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py:49 handle: <Handle NavigatorWatcher.waitForNavigation.<locals>.watchdog_cb(<Task finishe...> result=None>) at C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py:49> Traceback (most recent call last): File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\asyncio\events.py", line 145, in _run self._callback(*self._args) File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py", line 52, in watchdog_cb self._timeout) File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py", line 40, in _raise_error raise error concurrent.futures._base.TimeoutError: Navigation Timeout Exceeded: 3000 ms exceeded Traceback (most recent call last): File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\experiment.py", line 6, in <module> item = js.find('.MarketInfo_market-num_1lAXs',first=True).text AttributeError: 'NoneType' object has no attribute 'find' Error in atexit._run_exitfuncs: Traceback (most recent call last): File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\shutil.py", line 387, in _rmtree_unsafe os.unlink(fullname) PermissionError: [WinError 5] Access is denied: 'C:\\Users\\ar\\.pyppeteer\\.dev_profile\\tmp1gng46sw\\CrashpadMetrics-active.pma'

The price I get is available at the top of the page, which can be seen as 177.59 EUR Last trade price . I want to get 177.59 or any other current price.

+5

python python-3.x web-scraping python-requests

Toto Feb 28 '18 at 7:12

source share

3 answers

If you want to use a different method by running the Selenium web scraper

 from selenium import webdriver from selenium.webdriver.common.keys import Keys from selenium.common.exceptions import TimeoutException chrome_path = r"C:\Users\Mike\Desktop\chromedriver.exe" driver = webdriver.Chrome(chrome_path) driver.get("https://www.gdax.com/trade/LTC-EUR") item = driver.find_element_by_xpath('''//span[@class='MarketInfo_market-num_1lAXs']''') item = item.text print item driver.close()

result: 177.60 EUR

+1

Mike Feb 28 '18 at 7:42

source share

Do you need it to go through Requests-HTML? On the day you sent the message, the repo was 4 days old, and during the 3 days that passed there were 50 commits. It will not be fully stable for some time.

See here: https://github.com/kennethreitz/requests-html/graphs/commit-activity

OTOH, there is an API for gdax.

https://docs.gdax.com/#market-data

Now, if you are configured to use Py3, there is a python client on the GDAX website. Back to top I mention that this is an unofficial client; however, if you use this, you can quickly and easily get answers from the official GDAX api.

https://github.com/danpaquin/gdax-python

+1

fac Mar 03 '18 at 12:03

source share

Martijn pieters · Accepted Answer · 2018-03-03T22:18:33+0000

You have some errors. The first is a “navigation” timeout, indicating that the page has not completed rendering:

 Exception in callback NavigatorWatcher.waitForNavigation.<locals>.watchdog_cb(<Task finishe...> result=None>) at C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py:49 handle: <Handle NavigatorWatcher.waitForNavigation.<locals>.watchdog_cb(<Task finishe...> result=None>) at C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py:49> Traceback (most recent call last): File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\asyncio\events.py", line 145, in _run self._callback(*self._args) File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py", line 52, in watchdog_cb self._timeout) File "C:\Users\ar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyppeteer\navigator_watcher.py", line 40, in _raise_error raise error concurrent.futures._base.TimeoutError: Navigation Timeout Exceeded: 3000 ms exceeded

This trace does not rise in the main thread, because of this, your code was not interrupted. Your page may or may not be complete; you can set a longer timeout or enter a sleep cycle for the browser in order to process AJAX responses in time.

Then the response.html.render() element returns None . It loads the HTML into the Chromium's mute browser, leaves the JavaScript rendering in that browser, and then copies the HTML page to the response.html data structure in place, and nothing needs to be returned. Thus, js set to None , and not to a new instance of HTML , causing the next trace.

Use an existing response.html object to search after rendering:

 r.html.render() item = r.html.find('.MarketInfo_market-num_1lAXs', first=True)

Most likely, there is no such CSS class, because the last 5 characters are generated on each page, after the JSON data is loaded through AJAX. This makes it difficult to use CSS to find the item in question.

In addition, I found that without a sleep cycle, the browser does not have time to pick up AJAX resources and display the information you want to download. Give him, say, 10 seconds of sleep to do some work before copying the HTML. Set a longer timeout (default is 8 seconds) if you see network timeouts:

 r.html.render(timeout=10, sleep=10)

You can also set timeout to 0 to remove the timeout and just wait endlessly until the page loads.

Hopefully a future API update also provides features to wait for network activity to stop .

You can use the parse nested library to find the appropriate CSS classes:

 # search for CSS suffixes suffixes = [r[0] for r in r.html.search_all('MarketInfo_market-num_{:w}')] for suffix in suffixes: # for each suffix, find all matching elements with that class items = r.html.find('.MarketInfo_market-num_{}'.format(suffix)) for item in items: print(item.text)

Now we get the result:

 169.81 EUR + 1.01 % 18,420 LTC 169.81 EUR + 1.01 % 18,420 LTC 169.81 EUR + 1.01 % 18,420 LTC 169.81 EUR + 1.01 % 18,420 LTC

Your last trace shows that the Chromium user data path cannot be cleared. The Pyppeteer base library sets up the Mute Chromium browser with a temporary user data path, and in your case the directory contains some resource with a permanent lock. You can ignore the error, although you can try to delete all remaining files in the .pyppeteer folder later.

Inability to get trading price using the "HTML-Requests" library

More articles: