It depends on the size of the matrix and the number of iterations that need to be performed. This is because you need to copy the matrix data from the CPU memory to the GPU memory and copy the results from the GPU to the CPU. If you are going to perform only one iteration on the matrix, it is always better to do it on the processor, rather than doing it on the GPU. In addition, the GPU suffers from startup time. So, if you have more iterations that need to be done, go to the GPU, otherwise my option would be CPU. Similarly, matrix size also affects performance due to data copying.
source
share