In-place mixing of multidimensional arrays

Question

In-place mixing of multidimensional arrays

I am trying to implement a NaN-safe shuffling procedure in Cython, which can move along several axes of a multidimensional matrix of arbitrary dimension.

In the simple case, 1D matrices can simply be shuffled for all indices with values other than NaN using the Fisher-Yeiss algorithm:

def shuffle1D(np.ndarray[double, ndim=1] x): cdef np.ndarray[long, ndim=1] idx = np.where(~np.isnan(x))[0] cdef unsigned int i,j,n,m randint = np.random.randint for i in xrange(len(idx)-1, 0, -1): j = randint(i+1) n,m = idx[i], idx[j] x[n], x[m] = x[m], x[n]

I would like to extend this algorithm to handle large multidimensional arrays without modification (which launches a copy for more complex cases that are not covered here). To do this, I need to get rid of a fixed input dimension, which is not possible using numpy arrays or memory in Cython. Is there a workaround?

Thank you very much in advance!

+5

python arrays numpy multidimensional-array cython

user45893 Sep 29 '14 at 15:10

source share

2 answers

The following algorithm is based on slices where no copy is made, and it should work for any np.ndarray . The main steps:

np.ndindex() used to execute various multidimensional indexes, excluding the one that belongs to the axis you want to shuffle.
apply a tattoo that you have already developed for a 1-D case.

the code:

 def shuffleND(np.ndarray x, axis=-1): cdef np.ndarray[long long, ndim=1] idx cdef unsigned int i, j, n, m if axis==-1: axis = x.ndim-1 all_shape = list(np.shape(x)) shape = all_shape[:] shape.pop(axis) for slices in np.ndindex(*shape): slices = list(slices) axis_slice = slices[:] axis_slice.insert(axis, slice(None)) idx = np.where(~np.isnan(x[tuple(axis_slice)]))[0] for i in range(idx.shape[0]-1, 0, -1): j = randint(i+1) n, m = idx[i], idx[j] slice1 = slices[:] slice1.insert(axis, n) slice2 = slices[:] slice2.insert(axis, m) slice1 = tuple(slice1) slice2 = tuple(slice2) x[slice1], x[slice2] = x[slice2], x[slice1] return x

+2

Saullo castro 30 sept '14 at 7:55

source share

Saullo castro · Accepted Answer · 2014-09-30T10:05:23+0000

Thanks to the comments of @Veedrac, this answer uses more features of Cython.

The pointer array stores the memory address of the axis values
Your algorithm is used with a modification that checks nan values , preventing them from sorting
It will not create a copy for C ordered arrays. In the case of Fortran ordered arrays, the ravel() command will return a copy. This can be improved by creating another array of double pointers to carry x values, possibly with some cache limitation ...

This code is at least one order higher than the other, based on slices.

 from libc.stdlib cimport malloc, free cimport numpy as np import numpy as np from numpy.random import randint cdef extern from "numpy/npy_math.h": bint npy_isnan(double x) def shuffleND(x, int axis=-1): cdef np.ndarray[double, ndim=1] v # view of x cdef np.ndarray[int, ndim=1] strides cdef int i, j cdef int num_axis, pos, stride cdef double tmp cdef double **v_axis if axis==-1: axis = x.ndim-1 shape = list(x.shape) num_axis = shape.pop(axis) v_axis = <double **>malloc(num_axis*sizeof(double *)) for i in range(num_axis): v_axis[i] = <double *>malloc(1*sizeof(double)) try: tmp_strides = [s//x.itemsize for s in x.strides] stride = tmp_strides.pop(axis) strides = np.array(tmp_strides, dtype=np.int32) v = x.ravel() for indices in np.ndindex(*shape): pos = (strides*indices).sum() for i in range(num_axis): v_axis[i] = &v[pos + i*stride] for i in range(num_axis-1, 0, -1): j = randint(i+1) if npy_isnan(v_axis[i][0]) or npy_isnan(v_axis[j][0]): continue tmp = v_axis[i][0] v_axis[i][0] = v_axis[j][0] v_axis[j][0] = tmp finally: free(v_axis) return x

In-place mixing of multidimensional arrays

More articles: