In-place mixing of multidimensional arrays

I am trying to implement a NaN-safe shuffling procedure in Cython, which can move along several axes of a multidimensional matrix of arbitrary dimension.

In the simple case, 1D matrices can simply be shuffled for all indices with values ​​other than NaN using the Fisher-Yeiss algorithm:

def shuffle1D(np.ndarray[double, ndim=1] x): cdef np.ndarray[long, ndim=1] idx = np.where(~np.isnan(x))[0] cdef unsigned int i,j,n,m randint = np.random.randint for i in xrange(len(idx)-1, 0, -1): j = randint(i+1) n,m = idx[i], idx[j] x[n], x[m] = x[m], x[n] 

I would like to extend this algorithm to handle large multidimensional arrays without modification (which launches a copy for more complex cases that are not covered here). To do this, I need to get rid of a fixed input dimension, which is not possible using numpy arrays or memory in Cython. Is there a workaround?

Thank you very much in advance!

+5
source share
2 answers

Thanks to the comments of @Veedrac, this answer uses more features of Cython.

  • The pointer array stores the memory address of the axis values
  • Your algorithm is used with a modification that checks nan values , preventing them from sorting
  • It will not create a copy for C ordered arrays. In the case of Fortran ordered arrays, the ravel() command will return a copy. This can be improved by creating another array of double pointers to carry x values, possibly with some cache limitation ...

This code is at least one order higher than the other, based on slices.

 from libc.stdlib cimport malloc, free cimport numpy as np import numpy as np from numpy.random import randint cdef extern from "numpy/npy_math.h": bint npy_isnan(double x) def shuffleND(x, int axis=-1): cdef np.ndarray[double, ndim=1] v # view of x cdef np.ndarray[int, ndim=1] strides cdef int i, j cdef int num_axis, pos, stride cdef double tmp cdef double **v_axis if axis==-1: axis = x.ndim-1 shape = list(x.shape) num_axis = shape.pop(axis) v_axis = <double **>malloc(num_axis*sizeof(double *)) for i in range(num_axis): v_axis[i] = <double *>malloc(1*sizeof(double)) try: tmp_strides = [s//x.itemsize for s in x.strides] stride = tmp_strides.pop(axis) strides = np.array(tmp_strides, dtype=np.int32) v = x.ravel() for indices in np.ndindex(*shape): pos = (strides*indices).sum() for i in range(num_axis): v_axis[i] = &v[pos + i*stride] for i in range(num_axis-1, 0, -1): j = randint(i+1) if npy_isnan(v_axis[i][0]) or npy_isnan(v_axis[j][0]): continue tmp = v_axis[i][0] v_axis[i][0] = v_axis[j][0] v_axis[j][0] = tmp finally: free(v_axis) return x 
+4
source

The following algorithm is based on slices where no copy is made, and it should work for any np.ndarray . The main steps:

  • np.ndindex() used to execute various multidimensional indexes, excluding the one that belongs to the axis you want to shuffle.
  • apply a tattoo that you have already developed for a 1-D case.

the code:

 def shuffleND(np.ndarray x, axis=-1): cdef np.ndarray[long long, ndim=1] idx cdef unsigned int i, j, n, m if axis==-1: axis = x.ndim-1 all_shape = list(np.shape(x)) shape = all_shape[:] shape.pop(axis) for slices in np.ndindex(*shape): slices = list(slices) axis_slice = slices[:] axis_slice.insert(axis, slice(None)) idx = np.where(~np.isnan(x[tuple(axis_slice)]))[0] for i in range(idx.shape[0]-1, 0, -1): j = randint(i+1) n, m = idx[i], idx[j] slice1 = slices[:] slice1.insert(axis, n) slice2 = slices[:] slice2.insert(axis, m) slice1 = tuple(slice1) slice2 = tuple(slice2) x[slice1], x[slice2] = x[slice2], x[slice1] return x 
+2
source

Source: https://habr.com/ru/post/1203635/


All Articles