Julia: Parallel for a loop with large data movement

I want to run a parallel loop. I need each of my processes to have access to 2 large dictionaries, gene_dict and transcript_dict . This is what I tried first

 @everywhere( function EM ... end ) generefs = [ @spawnat i genes for i in 2:nprocs()] dict1refs = [ @spawnat i gene_dict for i in 2:nprocs()] dict2refs = [ @spawnat i transcript_dict for i in 2:nprocs()] result = @parallel (vcat) for i in 1:length(genes) EM(genes[i], gene_dict, transcript_dict) end 

but I get the following error for all processes (not just 5):

 exception on 5: ERROR: genes not defined in anonymous at no file:1514 in anonymous at multi.jl:1364 in anonymous at multi.jl:820 in run_work_thunk at multi.jl:593 in run_work_thunk at multi.jl:602 in anonymous at task.jl:6 UndefVarError(:genes) 

I thought that @spawnat move the three data structures that I need for all processes. My first thought, maybe this step takes some time, and the parallel loop is trying to start until the data transfer is complete.

+5
source share
1 answer

Data is moved using @spawnat , but is not bound to variables with the same name as the name on the main node. Instead, the data is stored in a rather hidden Dict called Base.PGRP for workers. To access the values ​​you need fetch RemoteRef , which in your case will be something like

result = @parallel (vcat) for i in 1:length(genes) EM(fetch(genes[i]), fetch(gene_dict[i]), fetch(transcript_dict[i])) end

+7
source

Source: https://habr.com/ru/post/1209922/


All Articles