Here is a rather esoteric solution using the special features of MATLAB. Build a huge 4k x 4k sparse matrix with 4x4 blocks diagonally. Then allow all at once.
On my machine, this solution gets the same solution up to the same accuracy as the @ Amro / Tom for-loop solution, but faster.
n = size(A,1); k = size(A,3); AS = reshape(permute(A,[1 3 2]),n*k,n); S = sparse( ... repmat(1:n*k,n,1)', ... bsxfun(@plus,reshape(repmat(1:n:n*k,n,1),[],1),0:n-1), ... AS, ... n*k,n*k); X = reshape(S\repmat(B,k,1),n,k);
for a random example:
For k = 10000 For loop: 0.122570 seconds. Giant sparse system: 0.032287 seconds.
If you know that your 4x4 matrices are positive definite, you can use chol on S to increase accuracy.
This is stupid. But just as the slow matlab for loops is still in 2015, even with JIT. This solution seems to find a sweet spot when k is not too large, so everything still fits into memory.
source share