Error in parallel R: error in serialization (data, node $ con): writing errors to the connection

I saw several other posts on this topic, and none of them were the same as the problem I am facing. But here goes:

I perform a parallel function using

cores <- detectCores() cl <- makeCluster(8L,outfile="output.txt") registerDoParallel(cl) x <- foreach(i = 1:length(y), .combine='list',.packages=c('httr','jsonlite'), .multicombine=TRUE,.verbose=F,.inorder=F) %dopar% {function(y[i])}

This often works fine, but now throws an error:

Serialization error (data, node $ con): writing errors to the connection

When viewing the output.txt file, I see:

 starting worker pid=11112 on localhost:11828 at 12:38:32.867 starting worker pid=10468 on localhost:11828 at 12:38:33.389 starting worker pid=4996 on localhost:11828 at 12:38:33.912 starting worker pid=3300 on localhost:11828 at 12:38:34.422 starting worker pid=10808 on localhost:11828 at 12:38:34.937 starting worker pid=5840 on localhost:11828 at 12:38:35.435 starting worker pid=8764 on localhost:11828 at 12:38:35.940 starting worker pid=7384 on localhost:11828 at 12:38:36.448 Error in unserialize(node$con) : embedded nul in string: '\0\0\0\006SYMBOL\0\004\0\t\0\0\0\003')'\0\004\0\t\0\0\0\004expr\0\004\0\t\0\0\0\004expr\0\004\0\t\0\0\0\003','\0\004\0\t\0\0\0\024SYMBOL_FUN' Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode - unserialize Execution halted 

This error is intermittent. Memory abounds (32 GB), and no other large R objects in memory. The function in parallel code extracts several small json data objects from small clouds and places them in an R object, so there are no large data files. I do not know why he sometimes sees the built-in zero and stops.

I have a similar problem with a function that pulls csv files from the cloud. Both functions worked fine under R 3.3.0 and R 3.4.0 so far.

I am using R 3.4.1 and RStudio 1.0.143 on Windows.

Here is my sessionInfo

 sessionInfo() R version 3.4.1 (2017-06-30) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] RJSONIO_1.3-0 RcppBDT_0.2.3 zoo_1.8-0 data.table_1.10.4 doParallel_1.0.10 iterators_1.0.8 [7] RQuantLib_0.4.2 foreach_1.4.3 httr_1.2.1 loaded via a namespace (and not attached): [1] Rcpp_0.12.12 lattice_0.20-35 codetools_0.2-15 grid_3.4.1 R6_2.2.2 jsonlite_1.5 tools_3.4.1 [8] compiler_3.4.1 

UPDATE

Now I get another similar error:

Error in unserialize (node ​​$ con): ReadItem: unknown type 100, possibly written by a later version of R

The nul error introduced seems to have disappeared. I also tried deleting .Rhistory and .Rdata, as well as deleting the subfolder of my packages and reloading all pacakges. At least this new mistake seems consistent. I can not find what the "unknown type 100" is.

+5
source share
2 answers

I get a similar error ... usually occurs the next time the script is run, when one of my previous scripts made a mistake or I stopped it earlier. This may be the part that you mention: β€œI don’t know why she sometimes sees the built-in zero and stops”, which may be a mistake.

It has good information, especially to leave 1 core for regular Windows processes. It is also mentioned: β€œIf you get an error from any of these functions, it usually means that at least one of the workers has died”, which could support my theory of failure after an error.

doParallel error in R: serialization error (data, node $ con): writing errors to the connection

So far, my solution has been to reinitialize the parallel server by starting it again:

 registerDoParallel(cl) 

It usually works after that, but I notice that the previous multi-core sessions in my task manager do not disappear, even if:

 stopCluster(cl) 

That is why I sometimes restart R.

+4
source

I also noticed that multi-core sessions do not disappear from the task manager.

Switching from use: stopCluster(cl) to stopImplicitCluster() Worked for me. From my reading this is supposed to be used when using the "single line" registerDoParallel(cores=x) vs

 cl<-makeCluster(x) registerDoParallel(cl) 

My "gut feeling" is that the way Windows handles clusters requires stopImplicitCluster, but your experience may vary.

I would comment, but this is (cue strip) MY FIRST STOCK MAIL !!!

+2
source

Source: https://habr.com/ru/post/1270339/


All Articles