Slow matrix multiplication performance using MTJ / Netlib (native)

I need to multiply large matrices ranging in size from 5000x5000 to 20000x20000. I had a problem finding a library with sparse matrices, and yet it can perform fast multiplication.

First of all, I read the previous question about the performance of Java matrix libraries ( Performance of Java math math libraries? ). Based on the main answer, I decided to go with JBLAS, as it was one of the fastest. In my case, it took about 50 or so to multiply the 5000x5000 matrix, which is pretty much slower than Matlab, but still portable.

The problem is that matrices can be quite large (up to 20k by 20k or more), but they are usually sparse. Only 30% of the elements in the matrix are not zeros. JBLAS does not provide the implementation of a sparse matrix, so the amount of memory required to store a large dense matrix can become very volatile. I tried switching to MTJ / Netlib as it should be one of the best libraries in the test that has a sparse matrix. A note here ( https://github.com/fommil/netlib-java/ ) says, in order to get better performance, I need to compile my own BLAS on my machine. So I downloaded OpenBLAS, compiled and installed it. I also run several commands to install the OpenBLAS library on Ubuntu 13.10:

$ cd ~/build/OpenBLAS $ make $ sudo make install PREFIX=/usr/local/openblas $ sudo cat "/usr/local/openblas/lib" > /etc/ld.so.conf.d/openblas.conf $ sudo ldconfig $ sudo update-alternatives --install /usr/lib/libblas.so.3 libblas.so.3 /usr/local/openblas/lib/libopenblas.so 90 $ sudo update-alternatives --config libblas.so.3 

I selected my compiled OpenBLAS library in the last update step. I assume that after this Netlib builds my compiled OpenBLAS library and uses it. I also conducted some test from http://r.research.att.com/benchmarks/R-benchmark-25.R and observed some acceleration in the previous one (using blas by default from ubuntu) and after the event (using my compiled OpenBLAS) .

However, the performance of matrix matrix multiplication in MTJ is still very slow. For example, I have two matrices A = 5824x5824, W = 5824x4782. I multiply them in Java

 Matrix AW = new FlexCompRowMatrix(A.numRows(), W.numColumns()); A.mult(W, AW); 

The code runs for more than 45 minutes, enough to print this entire post, and it still does not end there. Using JBLAS, the same matrix multiplication will take less than 1 minute. Is there something I missed?

Thanks!

+6
source share
2 answers

JBLAS performs dense matrix operations. MJT makes both dense and sparse. Using sparse matrices in a dense way is slow. FlexCompRowMatrix creates a sparse matrix.

What you want to do to compare directly with JBLAS is:

 Matrix a = new DenseMatrix(5000,5000); Matrix b = new DenseMatrix(5000,5000); Matrix c = new DenseMatrix(5000,5000); a.multAdd(b, c); 

Performance using MJT + OpenBlas should be about the same as MatLab.

+6
source

see http://jeshua.me/blog/NetlibJavaJNI and note that you may have to update your own package names in the test to demonstrate usage.

for example, you might need to change: Class javaBlasclass = Class.forName ("org.netlib.blas.JBLAS"); to: Class javaBlasclass = com.github.fommil.netlib.BLAS.class;

0
source

Source: https://habr.com/ru/post/958116/


All Articles