"RCURL" [R] package getURL webpage error scrambling API

I am trying to clear the data on pages from the API using the getURL function of the RCURL package in R. My problem is that I can not replicate the answer that I get when I open the URL in Chrome when I make a request using R. Essentially, when I open the API page (url below) in Chrome, it works fine, but if I ask it to use getURL in R (or using incognito mode in Chrome), I get the answer “500 Internal Server Error” and not pretty json i'm looking for.

URL / API in question: http://www.bluenile.com/api/public/loose-diamond/diamond-details/panel?country=USA¤cy=USD&language=en-us&productSet=BN&sku=LD04077082

Here is my (unsuccessful) request in [R].

test2 <- fromJSON(getURL("http://www.bluenile.com/api/public/loose-diamond/diamond-details/panel?country=USA&currency=USD&language=en-us&productSet=BN&sku=LD04077082", ssl.verifypeer = FALSE, useragent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36")) 

My research so far. First I reviewed this previous stack question and added a request to my useragent (it didn’t solve the problem, but may still be necessary): Problems with the ViralHeat API with the getURL () command in the RCurl package

Then I looked at this useful post that guides my rationale: R The mismatch between the browser and GET / getURL

My ideas for a solution This is not my area of ​​expertise, but I assume that the request does not contain the cookie necessary to complete the request (therefore, why it does not work in my browser in incognito mode). I compared the requests and responses from a successful request with a failed request:

Successful request: enter image description here

Bad request:

enter image description here

Does anyone have any ideas? Should I try to use the RSelenium package that was suggested by MrFlick in the second post I made.

+5
source share
1 answer

This is a polite site. He would like to know where you came from, what currency you use, etc., to give you the best user experience. It does this by setting a lot of cookies on the landing page. Therefore, we follow the example and go to the landing page, first receiving cookies, and then go to the page we want:

 library(RCurl) myURL <- "http://www.bluenile.com/api/public/loose-diamond/diamond-details/panel?country=USA&currency=USD&language=en-us&productSet=BN&sku=LD04077082" agent="Mozilla/5.0 (Windows NT 6.3; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0" #Set RCurl pars curl = getCurlHandle() curlSetOpt(cookiejar="cookies.txt", useragent = agent, followlocation = TRUE, curl=curl) firstPage <- getURL("http://www.bluenile.com", curl=curl) myPage <- getURL(myURL, curl = curl) library(RJSONIO) > names(fromJSON(myPage)) [1] "diamondDetailsHeader" "diamondDetailsBodies" "pageMetadata" "expandedUrl" [5] "newVersion" "multiDiamond" 

and cookies:

 > getCurlInfo(curl)$cookielist [1] ".bluenile.com\tTRUE\t/\tFALSE\t2412270275\tGUID\tDA5C11F5_E468_46B5_B4E8_D551D4D6EA4D" [2] ".bluenile.com\tTRUE\t/\tFALSE\t1475342275\tsplit\tver~3&presetFilters~TEST" [3] ".bluenile.com\tTRUE\t/\tFALSE\t1727630275\tsitetrack\tver~2&jse~0" [4] ".bluenile.com\tTRUE\t/\tFALSE\t1425230275\tpop\tver~2&china~false&french~false&ie~false&internationalSelect~false&iphoneApp~false&survey~false&uae~false" [5] ".bluenile.com\tTRUE\t/\tFALSE\t1475342275\tdsearch\tver~6&newUser~true" [6] ".bluenile.com\tTRUE\t/\tFALSE\t1443806275\tlocale\tver~1&country~IRL&currency~EUR&language~en-gb&productSet~BNUK" [7] ".bluenile.com\tTRUE\t/\tFALSE\t0\tbnses\tver~1&ace~false&isbml~false&fbcs~false&ss~0&mbpop~false&sswpu~false&deo~false" [8] ".bluenile.com\tTRUE\t/\tFALSE\t1727630275\tbnper\tver~5&NIB~0&DM~-&GUID~DA5C11F5_E468_46B5_B4E8_D551D4D6EA4D&SESS-CT~1&STC~32RPVK&FB_MINI~false&SUB~false" [9] "#HttpOnly_www.bluenile.com\tFALSE\t/\tFALSE\t0\tJSESSIONID\tB8475C3AEC08205E5AC6252C94E4B858" [10] ".bluenile.com\tTRUE\t/\tFALSE\t1727630278\tmigrationstatus\tver~1&redirected~false" 
+6
source

Source: https://habr.com/ru/post/1203883/


All Articles