RCurl will use HTTP proxies by default, but Tor provides a SOCKS proxy. Tor is smart enough to realize that the proxy client (RCurl) is trying to use the HTTP proxy, so the error message in the HTML is returned by Tor.
To get RCURL and curl for using the SOCKS proxy, you can use the protocol prefix, and for SOCKS5 there are two protocol prefixes: "socks5" and "socks5h" (see the twisting guide ). The latter will allow the SOCKS server to process DNS queries, which is the preferred method when using Tor (in fact, Tor will warn you if you allow the proxy client to resolve the host name).
Here is a pure R solution that Tor will use for dns queries.
library(RCurl) options(RCurlOptions = list(proxy = "socks5h://127.0.0.1:9050")) my.handle <- getCurlHandle() html <- getURL(url='https://www.torproject.org', curl=my.handle)
If you want to specify additional parameters, see below where to put them:
library(RCurl) options(RCurlOptions = list(proxy = "socks5h://127.0.0.1:9050", useragent = "Mozilla", followlocation = TRUE, referer = "", cookiejar = "my.cookies.txt" ) ) my.handle <- getCurlHandle() html <- getURL(url='https://www.torproject.org', curl=my.handle)
source share