Error in curlMultiperform (multihandle): inline nul in string

I am trying to load a link vector, but I get an error message that I do not know what to do. The code is included, hoping someone has a workaround.

CODE:

library(RCurl) library(XML) url <- "http://www.etfs.bmo.com/bmo-etfs/" url.parsed <- htmlParse(url) links <- xpathSApply(url.parsed, "//table//td/a/@href")[-c(1:3)] links <- paste0("http://www.etfs.bmo.com", links) pages <- getURI(links) 

ERROR MESSAGE:

 Error in curlMultiPerform(multiHandle) : embedded nul in string: ' \r\n </nobr>\r\n </td>\r\n\t\t\t </tr>\r\n\t\t\t \r\n\t\t\t\t\t \r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t \r\n\t\t\t\t \t<tr valign="top" >\r\n\t \t\t\t\t<td class="highlightText"><strong>Annualized Distribution Yield \r\n\t\t \t\t\t\r\n\t\t \t \t\t\t\t\r\n\t\t \t \t\t\t(Jul 07, 2016)\r\n\t\t \t \t\t\t\t\r\n\t\t \t \t\t\t \r\n\t\t \t\t\t\t\t<sup>1</sup></strong>\r\n\t\t \t\t\t\t</td>\r\n\t\t\t \t\t<td>\r\n \t\t<nobr>\r\n \t \t\t\r\n \t \t\t\t\r\n \t \t\t\t\r\n\t\t\t \t \t\t\t\t2.41%\r\n \t \t\t\t\r\n \t\t\t\t \r\n \t \t</nobr>\r\ 
+2
source share
1 answer

Well, it took a while, but I think I figured it out.

It turns out that the web page is incorrectly encoded. He claims to be β€œISO-8859-1,” but on some pages there is a trademark character encoded as \x99 , which means that he probably really uses the β€œWindows-1252” codepage. This character outside the normal ASCII range starts multibyte character reading, and the file quickly becomes corrupted.

As far as I can tell, RCURL does not support this encoding natively. But you can still download the file as binary data, and then convert it with iconv , which has more conversion options. This should work

 raw <- lapply(links, getURLContent, binary=TRUE) pages <- lapply(lapply(raw,readBin,"characer"), iconv, from="WINDOWS-1252", to="UTF-8") 

Now I tested this on my Mac. Exact values ​​from / to rows may vary by platform. Check the list from iconvlist() for a possible replacement for from= if this does not work on your computer.

+2
source

Source: https://habr.com/ru/post/975981/


All Articles