Since you did not provide enough information, it is difficult to say which way is better.
You can try to expand the loop for continuous access to memory.
, mat[x][y] 4 , mat[x-1][y-m-1] 4 , mat[x-1][y] 4 , mat[x][y-1] 4 . 4 .
, . . SIMD-, 3/4 .
, . :
for( x=0; x<n; x++ )
doSomething( mat[x][y] );
for( x=y; x<n*m; x+=m )
doSomething( mat[0][x] );
lea.