How to use Tor socks5 in R getURL

I want to use the Tor function in getURL in R. Tor works (tested in firefox), socks5 at port 9050 . But when I set this to R, I get the following error:

 html <- getURL("http://www.google.com", followlocation = T, .encoding="UTF-8", .opts = list(proxy = "127.0.0.1:9050", timeout=15)) 

Error in curlPerform (curl = curl, .opts = opts, .encoding = .encoding): '\ n \ nTor is not an HTTP proxy \ n \ n \ n

Tor is not an HTTP proxy

\ n

\ nAfter you set up your web browser to use Tor as an HTTP proxy. \ nThis is not true: Tor is a SOCKS proxy, not an HTTP proxy. \ nPlease configure your client accordingly.

I tried replacing the proxy socks, noki5, but that didn't work.

+6
source share
4 answers

twist bindings for R , after which you can use curl to call Tor SOCKS5 proxy.

A call from the shell (which you can translate to an R binding):

curl --socks5-hostname 127.0.0.1:9050 google.com

Tor will also make DNS for A. records.

+7
source

RCurl will use HTTP proxies by default, but Tor provides a SOCKS proxy. Tor is smart enough to realize that the proxy client (RCurl) is trying to use the HTTP proxy, so the error message in the HTML is returned by Tor.

To get RCURL and curl for using the SOCKS proxy, you can use the protocol prefix, and for SOCKS5 there are two protocol prefixes: "socks5" and "socks5h" (see the twisting guide ). The latter will allow the SOCKS server to process DNS queries, which is the preferred method when using Tor (in fact, Tor will warn you if you allow the proxy client to resolve the host name).

Here is a pure R solution that Tor will use for dns queries.

 library(RCurl) options(RCurlOptions = list(proxy = "socks5h://127.0.0.1:9050")) my.handle <- getCurlHandle() html <- getURL(url='https://www.torproject.org', curl=my.handle) 

If you want to specify additional parameters, see below where to put them:

 library(RCurl) options(RCurlOptions = list(proxy = "socks5h://127.0.0.1:9050", useragent = "Mozilla", followlocation = TRUE, referer = "", cookiejar = "my.cookies.txt" ) ) my.handle <- getCurlHandle() html <- getURL(url='https://www.torproject.org', curl=my.handle) 
+7
source

Hello Naparst. I would really appreciate how to make the solution that you propose the option should be something like this: opts <- list (socks5.hostname = "127.0.0.1:9050") (this does not work, since socks5.hostname is not an option )

+2
source

On Mac OSX, install the Tor Bundle for Mac and Privoxy , and then update the proxy settings in the system settings.

"System Settings" โ†’ "Wi-FI" โ†’ "Advanced" โ†’ "Proxies" โ†’ set web proxy (HTTP) "Web proxy server 127.0.0.1:8118

'System Settings' โ†’ 'Wi-FI' โ†’ 'Advanced' โ†’ 'Proxies' โ†’ install secure web proxy server (HTTPS) 127.0.0.1:8118 โ†’ 'OK' โ†’ 'Apply'

 library(rcurl) curl <- getCurlHandle() curlSetOpt(proxy='127.0.0.1:9150',proxytype=5,curl=curl) html <- getURL(url='check.torproject.com',curl=curl) 
+2
source

Source: https://habr.com/ru/post/950534/


All Articles