Approach No. 1
One approach with np.lib.stride_tricks.as_strided , which gives us a view into a 2D input array and, as such, does not take up more memory space -
L = 3
Example input, output -
In [43]: a Out[43]: array([[ 1, 2, 3], [ 4, 5, 6], [ 7, 8, 9], [10, 11, 12], [13, 14, 15]]) In [44]: out Out[44]: array([[[ 7, 8, 9], [ 4, 5, 6], [ 1, 2, 3]], [[10, 11, 12], [ 7, 8, 9], [ 4, 5, 6]], [[13, 14, 15], [10, 11, 12], [ 7, 8, 9]]])
Approach # 2
Alternatively, a bit easier with broadcasting after generating all row indices -
In [56]: a[range(L-1,-1,-1) + np.arange(shp[0]-L+1)[:,None]] Out[56]: array([[[ 7, 8, 9], [ 4, 5, 6], [ 1, 2, 3]], [[10, 11, 12], [ 7, 8, 9], [ 4, 5, 6]], [[13, 14, 15], [10, 11, 12], [ 7, 8, 9]]])