Here is one vector solution -
m,n = a.shape idx = np.mod((n-1)*np.arange(m)[:,None] + np.arange(n), n) out = a[np.arange(m)[:,None], idx]
Example input, output -
In [256]: a Out[256]: array([[73, 55, 79, 52, 15], [45, 11, 19, 93, 12], [78, 50, 30, 88, 53], [98, 13, 58, 34, 35]]) In [257]: out Out[257]: array([[73, 55, 79, 52, 15], [12, 45, 11, 19, 93], [88, 53, 78, 50, 30], [58, 34, 35, 98, 13]])
Since you mentioned that you call such a looping procedure several times, create an idx indexing array once and reuse it later.
Further improvement
For repeated uses, you better create full linear indexes and then use np.take to retrieve collapsed elements, for example:
full_idx = idx + n*np.arange(m)[:,None] out = np.take(a,full_idx)
Let's see what an improvement is -
In [330]: a = np.random.randint(11,99,(600,600)) In [331]: m,n = a.shape ...: idx = np.mod((n-1)*np.arange(m)[:,None] + np.arange(n), n) ...: In [332]: full_idx = idx + n*np.arange(m)[:,None] In [333]: %timeit a[np.arange(m)[:,None], idx]
Around 3x improvement!