I am trying to download traffic data from pems.dot.ca.gov by following this section .
rm(list=ls()) library(rvest) library(xml2) library(httr) url <- "http://pems.dot.ca.gov/?report_form=1&dnode=tmgs&content=tmg_volumes&tab=tmg_vol_ts&export=&tmg_station_id=74250&s_time_id=1369094400&s_time_id_f=05%2F21%2F2013&e_time_id=1371772740&e_time_id_f=06%2F20%2F2013&tod=all&tod_from=0&tod_to=0&dow_5=on&dow_6=on&tmg_sub_id=all&q=obs_flow&gn=hour&html.x=34&html.y=8" pgsession <- html_session(url) pgform <-html_form(pgsession)[[1]] filled_form <- set_values(pgform, 'username' = 'omitted', 'password' = 'omitted') resp = submit_form(pgsession, filled_form) resp_2 = resp$response cont = resp_2$content
I checked the class()
these elements and found that resp is a "session", resp_2 is a "response", and cont is "raw". My question is: how can I extract the html content correctly so that I can continue with XPath to select the actual data I want from this page? My intuition is that I have to parse resp_2, which is the answer, but I just can't get it to work. Your help is greatly appreciated!
source share