Here is a solution using bsxfun
A = magic(3); T = [-1 1] T = diag(T); M=bsxfun(@times,permute(A,[3,1,4,2]),permute(T,[1,3,2,4])); M=reshape(M,size(T).*size(A));
It creates a 4D matrix, where the individual blocks are M(:,i,:,j) , then converted to a 2D matrix.
The image processing toolkit provides another very short but slow solution:
A = magic(3); T = [-1 1] T = diag(T); M=blockproc(A,[1 1],@(x) x.data.*T);
And finally, an implementation that generates a sparse matrix, which can be useful for large T, since your matrix will contain many zeros:
T=[-1 1]; A=magic(3); %p and q hold the positions where the first element element is stored. Check sparse(p(:),q(:),A(:)) to understand this intermediate step [p,q]=ndgrid(1:numel(T):numel(T)*size(A,1),1:numel(T):numel(T)*size(A,2)); %now p and q are extended to hold the indices for all elements tP=bsxfun(@plus,p(:),0:numel(T)-1); tQ=bsxfun(@plus,q(:),0:numel(T)-1); % tA=bsxfun(@times,A(:),T); M=sparse(tP,tQ,tA);
When T has size nx1, a sparse solution reduces your memory usage by about n / 1.55.