Error 403 when using REST to enter the site for cleaning

I am trying to clear a page on a website where a login is required and I get 403 error.

I changed the code from these 2 posts for my site, Using rvest or httr to login to non-standard forms on a web page and how to reuse a session to avoid re-logging in when cleaning with rvest?

library(rvest) pgsession <- html_session("https://www.optionslam.com/earnings/stocks/MSFT?page=-1") pgform <- html_form(pgsession)[[1]] filled_form <- set_values(pgform, 'username'='user', 'password'='pass') s <- submit_form(pgsession, filled_form) # s is your logged in session 

When the code runs, I get this message:

 Submitting with 'NULL' Warning message: In request_POST(session, url = url, body = request$values, encode = request$encode, : Forbidden (HTTP 403). 

I also run the code this way, updating user_agent as RS however in the comments I get the same error as above.

 library(rvest) library(httr) uastring <- "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36" pgsession <- html_session("https://www.optionslam.com/earnings/stocks/MSFT?page=-1", user_agent(uastring)) pgform <- html_form(pgsession)[[1]] filled_form <- set_values(pgform, 'username'='user', 'password'='pass') s <- submit_form(pgsession, filled_form) # s is your logged in session 

If you pull out a page without logging in, it shows a little data table at the bottom right of the text: "Available events: 65"

After logging in, it will display all 65 events, and the table will be populated, and this is what I want to download. I have all the code needed for this, but I'm stuck only in the login part.

Thank you for your help.

+6
source share
2 answers

Using RS I used RSelenium to log in successfully.

A quick note for other Mac users using chrome or phantom. I am running El Capitan, so there was a problem with mac recognizing paths to both bin files. Instead, I moved the bin files to / usr / local / bin, and they ran without problems.

Below is the code:

 library(RSelenium) RSelenium::startServer() remDr <- remoteDriver(browserName = "chrome") remDr$open() appURL <- 'https://www.optionslam.com/accounts/login/' remDr$navigate(appURL) remDr$findElement("id", "id_username")$sendKeysToElement(list("user")) remDr$findElement("id", "id_password")$sendKeysToElement(list("password", key='enter')) appURL <- 'https://www.optionslam.com/earnings/stocks/MSFT?page=-1' remDr$navigate(appURL) 

This can also be done with phantom,

 library(RSelenium) pJS <- phantom() # start phantomjs appURL <- 'https://www.optionslam.com/accounts/login/' remDr <- remoteDriver(browserName = "phantomjs") remDr$open() remDr$navigate(appURL) remDr$findElement("id", "id_username")$sendKeysToElement(list("user")) remDr$findElement("id", "id_password")$sendKeysToElement(list("password", key='enter')) appURL <- 'https://www.optionslam.com/earnings/stocks/MSFT?page=-1' remDr$navigate(appURL) 
+4
source

Here's the answer to solving the problem in the original use case with rvest :

  library(rvest) library(httr) uastring <- "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36" pgsession <- html_session("https://www.optionslam.com/earnings/stocks/MSFT?page=-1", user_agent(uastring)) pgform <- html_form(pgsession)[[1]] filled_form <- set_values(pgform, username = 'un', password = 'ps') s <- submit_form(pgsession, filled_form, submit = NULL, config(referer = pgsession$url)) # s is your logged in session 

Requested requires knowledge of the page from which you came ( referer (sic)).

 config(referer = pgsession$url) 
+1
source

Source: https://habr.com/ru/post/1011605/


All Articles