A memory leak calling a cython function with large numpy array parameters?

Question

A memory leak calling a cython function with large numpy array parameters?

I am trying to write python code that calls the following cython function test1 as follows:

def test1( np.ndarray[np.int32_t, ndim=2] ndk, np.ndarray[np.int32_t, ndim=2] nkw, np.ndarray[np.float64_t, ndim=2] phi): for _ in xrange(int(1e5)): test2(ndk, nkw, phi) cdef int test2(np.ndarray[np.int32_t, ndim=2] ndk, np.ndarray[np.int32_t, ndim=2] nkw, np.ndarray[np.float64_t, ndim=2] phi): return 1

my clean python code will call test1 and pass 3 numpy arrays as parameters, and they are very large (around 10 ^ 4 * 10 ^ 3). Test1, in turn, will call test2, which is defined by the cdef keywords and passes these arrays. Since test1 needs to call test2 many times (about 10 ^ 5) before it returns, and test2 does not need to be called outside of cython code, I use cdef instead of def .

But the problem is that every time test1 calls test2, the memory starts to grow steadily. I tried calling gc.collect() outside of this cython code, but it does not work. And finally, the program will be killed by the system because it ate all the memories. I noticed that this problem only occurs with the cdef and cpdef functions , and if I change it to def , it works fine.

I think that test1 should pass references of these arrays to test2 instead of an object. But it looks like it creates new objects of these arrays and passes them to test2, and these objects are never affected by python gc afterwards.

Did I miss something?

+6

python numpy cython

Yang yuan Mar 25 '15 at 4:19

source share

1 answer

Yang yuan · Answer 1 · 2015-03-25T07:09:22+0000

I am still confused by this problem. But I found another way around this problem. Just explicitly tell cython to pass the pointer like this:

 def test1( np.ndarray[np.int32_t, ndim=2] ndk, np.ndarray[np.int32_t, ndim=2] nkw, np.ndarray[np.float64_t, ndim=2] phi): for _ in xrange(int(1e5)): test2(&ndk[0,0], &nkw[0,0], &phi[0,0]) cdef int test2(np.int32_t* ndk, np.int32_t* nkw, np.float64_t* phi): return 1

However, you will need to index the array as follows: ndk[i*row_len + j] Details: https://github.com/cython/cython/wiki/tutorials-NumpyPointerToC

A memory leak calling a cython function with large numpy array parameters?

More articles: