Preliminary answer that doesn't work
You can easily vectorize the swap operation using arrays of indices (c1, r1, c2, r2) instead of repeating lists of scalar indices.
c1 = np.array(<all the "c1" values>, dtype=int) r1 = np.array(<all the "r1" values>, dtype=int) c2 = np.array(<all the "c2" values>, dtype=int) r2 = np.array(<all the "r2" values>, dtype=int) A[c1, r1], A[c2, r2] = A[c2, r2], A[c1, r1]
Note that this performs all swaps in one pass, which may be different from iterative if the exchange order matters. For this reason, this is an invalid answer to your question .
eg. replacing p1 by p2, then p2 with p3 differs from the substitution of p1 and p2, and p2 and p3 at a time. In the latter case, both p1 and p3 get the original value p2, and p2 gets the last of the values ββbetween p1 and p3, i.e. p3 (in accordance with the order that they appear in the index array).
However, since this is the speed you need, a vector operation (in some way) should be the way to go.
Adding correctness to the above solution
So, how can we do a vectorized replacement, getting the desired result? We can use a hybrid approach, breaking index lists into chunks (as small as possible), where each chunk contains only unique points, thereby guaranteeing order. The permutation of each fragment is performed using scororized-ly, and we only iterate over the pieces.
Here is an example of how this might work:
c1, r1 = np.array([ np.arange(10), np.arange(10) ]) c2, r2 = np.array([ [2,4,6,8,0,1,3,5,7,9], [9,1,6,8,2,2,2,2,7,0] ]) A = np.empty((15,15)) def get_chunk_boundry(c1, r1, c2, r2): a1 = c1 + 1j * r1 a2 = c2 + 1j * r2 set1 = set() set2 = set() for i, (x1, x2) in enumerate(zip(a1, a2)): if x1 in set2 or x2 in set1: return i set1.add(x1); set2.add(x2) return len(c1) while len(c1) > 0: i = get_chunk_boundry(c1, r1, c2, r2) c1b = c1[:i]; r1b = r1[:i]; c2b = c2[:i]; r2b = r2[:i] print 'swapping %d elements' % i A[c1b,r1b], A[c2b,r2b] = A[c2b,r2b], A[c1b,r1b] c1 = c1[i:]; r1 = r1[i:]; c2 = c2[i:]; r2 = r2[i:]
Slicing here will be faster if you start storing indexes as a 2dim (N x 4) array.