How to clear table with rvest and xpath?

Question

How to clear table with rvest and xpath?

using the following documentation , I tried to clear a series of tables from marketwatch.com

here is the code below:

The link and xpath are already included in the code:

url <- "http://www.marketwatch.com/investing/stock/IRS/profile" valuation <- url %>% html() %>% html_nodes(xpath='//*[@id="maincontent"]/div[2]/div[1]') %>% html_table() valuation <- valuation[[1]]

I get the following error:

 Warning message: 'html' is deprecated. Use 'read_html' instead. See help("Deprecated")

Thanks in advance.

+5

r xpath web-scraping rvest

Alex Bădoi Feb 29 '16 at 19:06

source share

1 answer

SymbolixAU · Accepted Answer · 2016-03-01T00:30:14+0000

This website does not use the html table, so html_table() cannot find anything. It actsaully uses div column and data lastcolumn .

So you can do something like

 url <- "http://www.marketwatch.com/investing/stock/IRS/profile" valuation_col <- url %>% read_html() %>% html_nodes(xpath='//*[@class="column"]') valuation_data <- url %>% read_html() %>% html_nodes(xpath='//*[@class="data lastcolumn"]')

Or even

 url %>% read_html() %>% html_nodes(xpath='//*[@class="section"]')

To get most of the way from you.

Also read their terms of use - especially 3.4.

How to clear table with rvest and xpath?

More articles: