Follow page redirects using rvest in R

I am new to R and rvest. I am trying to use them to retrieve information from a website (www.medicinescomplete.com) that allows you to log in using the Athens academic login system. In the browser, when you click the athens login button, you go to the athens login form. After sending the user credentials, the form then redirects the browser back to the original site, but is logged in.

I used the submit_form () function to send the credentials to the athens form, and this returns the 200 code. However, R does not follow the redirect in the browser, and if I use the jump_to () command to return to the original site, it will not log in . I suspect that the redirected link returned by the page on the sign may contain the log credentials I need, but I don’t know how to find the link and send it using rvest

Has anyone developed how to log in through Athens using rvest, or have any idea on how to get it to follow automatic redirection?

The code I used for this is a credential change:

library(rvest) library(magrittr) url <- "https://www.medicinescomplete.com/about/" mcsession <- html_session(url) mcsession <- jump_to(mcsession, "/mc/athens.htm? uri=https%3A%2F%2Fwww.medicinescomplete.com%2Fabout%2F") athensform <- html_form(mcsession)[[1]] athensform <-set_values(athensform, ath_uname = "xxx", ath_passwd = "yyy") submit_form(mcsession, athensform) jump_to(mcsession, "https://www.medicinescomplete.com/mc/bnf/current/") 

I get code 200 for the submit_form () stage, but 403 is forbidden code for the last line jump_to ().

Then I passed the submit_form step to html () and printed it. From what I could understand, it was a successful login, but there is a line in the body of the main page that refers to a redirect to the source site. The html for the entire page is too long to publish, but the corresponding bit is as follows:

 <div style="padding: 8px;" id="logindiv"> <form method="POST" action="https://www.medicinescomplete.com/mc/athens"> Please wait while we transfer you. <br><noscript>JavaScript disabled, please<input type="submit" value="click here" style="border:none;background:none;text-decoration:underline;color:#E27B2F;"> 

And I am wondering if this next bit applies to some input key:

 <input type="hidden" name="TARGET" value="https://www.medicinescomplete.com/about/" style="display:none"><input type="hidden" name="RelayState" value="https://www.medicinescomplete.com/about/" style="display:none"><input type="hidden" name="SAMLResponse" value="PFJlc3BvbnNlIHhtbG5zPSJ1cm46b2FzaXM6bmFtZXM6dGM6U0FNTDoyLjA6cHJvdG9jb2wiIHhtbG5zOnNhbWwyPSJ1cm46b2FzaXM6bmFtZXM6dGM6U0FNTDoyLjA6YXNzZXJ0aW9uIiBEZXN... 

Yeah! The following is on the page:

 <script> window.onload = function() { document.forms[0].submit(); } </script> 

I think the window is designed to automatically send another form that runs a message on the medicinescomplete.com source site, for authentication using a hidden field as login credentials. However, trying to use submit_form () on this page, I don't seem to get any more! I added the following line to try to figure out what is happening:

 > submit_form(mcsession, athensform) %>% html_form() %>% str() 

And this gives the following result:

 Submitting with 'submit' List of 1 $ :List of 5 ..$ name : chr "<unnamed>" ..$ method : chr "POST" ..$ url : chr "https://www.medicinescomplete.com/mc/athens" ..$ enctype: chr "form" ..$ fields :List of 4 .. ..$ NULL :List of 7 .. .. ..$ name : NULL .. .. ..$ type : chr "submit" .. .. ..$ value : chr "click here" .. .. ..$ checked : NULL .. .. ..$ disabled: NULL .. .. ..$ readonly: NULL .. .. ..$ required: logi FALSE .. .. ..- attr(*, "class")= chr "input" .. ..$ TARGET :List of 7 .. .. ..$ name : chr "TARGET" .. .. ..$ type : chr "hidden" .. .. ..$ value : chr "https://www.medicinescomplete.com/about/" .. .. ..$ checked : NULL .. .. ..$ disabled: NULL .. .. ..$ readonly: NULL .. .. ..$ required: logi FALSE .. .. ..- attr(*, "class")= chr "input" .. ..$ RelayState :List of 7 .. .. ..$ name : chr "RelayState" .. .. ..$ type : chr "hidden" .. .. ..$ value : chr "https://www.medicinescomplete.com/about/" .. .. ..$ checked : NULL .. .. ..$ disabled: NULL .. .. ..$ readonly: NULL .. .. ..$ required: logi FALSE .. .. ..- attr(*, "class")= chr "input" .. ..$ SAMLResponse:List of 7 .. .. ..$ name : chr "SAMLResponse" .. .. ..$ type : chr "hidden" .. .. ..$ value : chr "PFJlc3BvbnNlIHhtbG5zPSJ1cm46b2FzaXM6bmFtZXM6dGM6U0FNTDoyLjA6cHJvdG9jb2wiIHhtbG5zOnNhbWwyPSJ1cm46b2FzaXM6bmFtZXM6dGM6U0FNTDoyLjA"| __truncated__ .. .. ..$ checked : NULL .. .. ..$ disabled: NULL .. .. ..$ readonly: NULL .. .. ..$ required: logi FALSE .. .. ..- attr(*, "class")= chr "input" .. ..- attr(*, "class")= chr "fields" ..- attr(*, "class")= chr "form" 

It seems to me that the information in this form should allow me to enter the source site, but I do not quite understand how to do it! Unfortunately, when I try again to use the submit_form () function with this form, it does not work. I tried this:

 submit_form(mcsession, athensform) %>% html_form() %>% submit_form(mcsession, .) %>% html() 

And got the following:

 Submitting with 'submit' Submitting with '' Error in if (!(submit %in% names(submits))) { : argument is of length zero 
+6
source share
1 answer

This is most likely due to this issue , which prevents httr issuing the correct GET request when redirecting.

It’s a little hard to guess because you are missing a reproducible example or the full detailed output of your request.

The workaround is to prevent redirection with:

 rvest::submit_form(..., httr::config(followlocation = FALSE)) 
+1
source

Source: https://habr.com/ru/post/984550/


All Articles