I did some performance experiments in Go with matrix multiplication and came across some unexpected results.
Version 1:
func newMatrix(n int) [][]int { m := make([][]int, n) buf := make([]int, n*n) for i := range m { m[i] = buf[i*n : (i+1)*n] } return m } func mult1(m1, m2, res [][]int) [][]int { for i := range m1 { for k := range m1[0] { for j := range m2[0] { res[i][j] += m1[i][k] * m2[k][j] } } } return res }
From a linear array, I create several slices that represent the rows of the matrix.
Version 2:
func mult2(m1, m2, res []int, n int) []int { for i := 0; i < n; i++ { for k := 0; k < n; k++ { for j := 0; j < n; j++ { res[i*n+j] += m1[i*n+k] * m2[k*n+j] } } } return res }
In this version, I just use a linear array and index into it from multiplication.
Matrix 2 Multiplication 2048x2048 gives the following runtime:
version 1: 35.550813801s version 2: 19.090223468s
Version 2 is almost twice as fast.
I used the approach below to measure:
start := time.Now() mult(m1, m2, m3) stop := time.Now()
I know that using slices provides another level of indirection that can affect cache performance, but I did not expect this to be such a big difference. Unfortunately, I did not find any good tool that works with a Mac that can analyze the effectiveness of the cache in Go, so I canβt say for sure what exactly causes the difference in performance.
So, I think I'm asking, is this the expected behavior or is there something I am missing?
Software and hardware: Go version 1.4.2 darwin / amd64; OS X 10.10.3; 2 GHz Quad Core i7.