From your code, it's pretty hard to figure out what you are trying to achieve. I think you want to calculate the matrix d[0..m, 0..n] as follows:
+---------+-------------------------+ | 0.0 | b00 b10 ...... b(n-1)0 | +---------+-------------------------+ | a00 | d11 d12 ...... d1n | | a10 | d21 d22 ...... d2n | | ... | ... ... ...... ... | | ... | ... ... ...... ... | | ... | ... ... ...... ... | | a(m-1)0 | dm1 dm2 ...... dmn | +---------+-------------------------+
where the main part (the inner matrix d[1..m, 1..n] ) is the multiplication of the three matrices matA1 ( matA after trimming the first columns), matC and matB1 ( matB after trimming the first columns and transposed).
To understand how the matrix works, a good way is to talk about the size of the matrix. Let ra , ca , rb , cb , rc and cc denote the number of rows and columns in matA , matB and matC respectively. Multiplication refers to the number of three matrices of size ra x (ca-1) , rc x cc and (cb-1) x rb ; this makes sense if rc = ca-1 and cc = cb-1 . We got the resulting matrix d size (ra+1) x (rb+1) .
Here is my attempt without using a for loop:
let calculate (matA : matrix) (matB : matrix) (matC : matrix) = let ra = matA.NumRows let ca = matA.NumCols let rb = matB.NumRows let cb = matB.NumCols let matrixCalcul = Matrix.zero (ra+1) (rb+1) matrixCalcul.[1.., 0..0] <- matA.[0.., 0..0] matrixCalcul.[0..0, 1..] <- matB.[0.., 0..0].Transpose matrixCalcul.[1.., 1..] <- (matA.Columns(1, ca-1) * matC) * matB.Columns(1, cb-1).Transpose matrixCalcul
I tested with matA , matB and matC in sizes 200x279, 200x1279 and 278x1238 respectively. Two versions give the same result, and my 40x function is faster than the original. There are many reasons for this, but in general, the vector version has much better performance when it comes to calculating the matrix.