How do you do parallel matrix multiplication in Julia?

Is there a way to do parallel matrix multiplication in julia? I tried using DArrays, but it was significantly slower than just single-threaded multiplication.

+5
source share
3 answers

In parallel, in what sense? If you mean single-machine, multi-threaded, then Julia does this by default, since OpenBLAS (the linear algebra library used) is multi-threaded.

If you mean a multiprocessor, distributed-computing style, then you will encounter high communication costs, which will cost only very big problems, and an individual approach may be necessary.

+6
source

Most likely, the problem is that direct (possibly single-threaded) matrix transformation is usually performed with an optimized library function. In the case of OpenBLAS, this is already multithreaded. For 2000x2000 arrays, simple matrix multiplication

 @time c = sa * sb; 

results in 0.3 seconds of multithreading and 0.7 seconds of write once.

Separation of one dimension during multiplication, times become even worse and reach about 17 seconds in single-point mode.

 @time for j = 1:n sc[:,j] = sa[:,:] * sb[:,j] end 

shared arrays

The solution to your problem may be to use shared arrays that use the same data in your processes on the same computer. Note that shared arrays are still marked as experimental.

 # create shared arrays and initialize them with random numbers sa = SharedArray(Float64,(n,n),init = s -> s[localindexes(s)] = rand(length(localindexes(s)))) sb = SharedArray(Float64,(n,n),init = s -> s[localindexes(s)] = rand(length(localindexes(s)))) sc = SharedArray(Float64,(n,n)); 

Then you need to create a function that performs cheap matrix multiplication on a subset of the matrix.

 @everywhere function mymatmul!(n,w,sa,sb,sc) # works only for 4 workers and n divisible by 4 range = 1+(w-2) * div(n,4) : (w-1) * div(n,4) sc[:,range] = sa[:,:] * sb[:,range] end 

Finally, the main process tells workers to work on their part.

 @time @sync begin for w in workers() @async remotecall_wait(w, mymatmul!, n, w, sa, sb, sc) end end 

which takes about 0.3 seconds , which is the same time as multi-threaded single-processor time.

+5
source

It looks like you are interested in dense matrices, in which case see other answers. If you (or become) interested in sparse matrices, see https://github.com/madeleineudell/ParallelSparseMatMul.jl .

+4
source

Source: https://habr.com/ru/post/1236928/


All Articles