How to reuse a session to avoid re-logging in when cleaning with rvest?

I have developed several codes to clear traffic data based on this section . I need to clear many pages after logging in, but now my codes repeatedly go to the site for each URL. How can I reuse a session to avoid re-logging in so that I hope the codes can work faster? Here's the pseudo code:

generateURL <- function(siteID){return siteURL} scrapeContent <- function(siteURL, session, filled_form){return content} mainPageURL <- 'http://pems.dot.ca.gov/' pgsession <- html_session(mainPageURL) pgform <- html_form(pgsession)[[1]] filled_form <- set_value(pgform, 'username'='myUserName', 'password'='myPW') siteIDList = c(1,2,3) vectorOfContent <- vector(mode='list', length=3) #to store all the content i=1 for (siteID in siteIDList){ url = generateURL(siteID) content = scrapeContent(url, pgsession, filled_form) vectorOfContent[[i]]=content i = i +1} 

I read ruc documnentation, but there are no such details in it. My question is: how can I reuse a session to avoid re-logging in? Thank you

+1
source share
1 answer

You can do something like this:

 require(rvest) pgsession <- html_session(mainPageURL) pgform <- html_form(pgsession)[[1]] filled_form <- set_value(pgform, 'username'='myUserName', 'password'='myPW') s <- submit_form(pgsession, pgform) # s is your logged in session vectorOfContent <- vector(mode='list', length=3) for (siteID in siteIDList){ url <- generateURL(siteID) # jump_to navigates within the session, read_html parses the html vectorOfContent[[siteID]]=s %>% jump_to(generateURL) %>% read_html() } 
+1
source

Source: https://habr.com/ru/post/1011610/


All Articles