First I will try to answer part 2, and then 1 and 3.
First, arr = <something> is a simple variable assignment, while arr[:] = <something> assigns the contents of an array. In the code below, after arr[:] = x , arr is still a memmapped array, whereas after arr = x , arr is ndarray.
>>> arr = np.memmap('mm', dtype='float32', mode='w+', shape=(1,10000000)) >>> type(arr) <class 'numpy.core.memmap.memmap'> >>> x = np.ones((1,10000000)) >>> type(x) <class 'numpy.ndarray'> >>> arr[:] = x >>> type(arr) <class 'numpy.core.memmap.memmap'> >>> arr = x >>> type(arr) <class 'numpy.ndarray'>
In the case of np.argsort it returns an array of the same type of its argument. Therefore, in this particular case, I think that there should be no difference between executing arr = np.argsort(x) or arr[:] = np.argsort(x) . In your code, arr2 will have a memmapped array. But there is a difference.
>>> arr = np.memmap('mm', dtype='float32', mode='w+', shape=(1,10000000)) >>> x = np.ones((1,10000000)) >>> arr[:] = x >>> type(np.argsort(x)) <class 'numpy.ndarray'> >>> type(np.argsort(arr)) <class 'numpy.core.memmap.memmap'>
OK, now thatβs different. Using arr[:] = np.argsort(arr) , if we look at the changes to the memmapped file, we will see that every change to arr is followed by a change to the md5sum file.
>>> import os >>> import numpy as np >>> arr = np.memmap('mm', dtype='float32', mode='w+', shape=(1,10000000)) >>> arr[:] = np.zeros((1,10000000)) >>> os.system("md5sum mm") 48e9a108a3ec623652e7988af2f88867 mm 0 >>> arr += 1.1 >>> os.system("md5sum mm") b8efebf72a02f9c0b93c0bbcafaf8cb1 mm 0 >>> arr[:] = np.argsort(arr) >>> os.system("md5sum mm") c3607e7de30240f3e0385b59491ac2ce mm 0 >>> arr += 1.3 >>> os.system("md5sum mm") 1e6af2af114c70790224abe0e0e5f3f0 mm 0
We see that arr retains its _mmap attribute.
>>> arr._mmap <mmap.mmap object at 0x7f8e0f086198>
Now, using arr = np.argsort(x) , we see that md5sums stop changing. Even if the arr type is a memmapped array, it is a new object, and it seems that the memory mapping has been removed.
>>> import os >>> import numpy as np >>> arr = np.memmap('mm', dtype='float32', mode='w+', shape=(1,10000000)) >>> arr[:] = np.zeros((1,10000000)) >>> os.system("md5sum mm") 48e9a108a3ec623652e7988af2f88867 mm 0 >>> arr += 1.1 >>> os.system("md5sum mm") b8efebf72a02f9c0b93c0bbcafaf8cb1 mm 0 >>> arr = np.argsort(arr) >>> os.system("md5sum mm") b8efebf72a02f9c0b93c0bbcafaf8cb1 mm 0 >>> arr += 1.3 >>> os.system("md5sum mm") b8efebf72a02f9c0b93c0bbcafaf8cb1 mm 0 >>> type(arr) <class 'numpy.core.memmap.memmap'>
Now the attribute '_mmap' is None.
>>> arr._mmap >>> type(arr._mmap) <class 'NoneType'>
Now part 3. It seems pretty easy to lose the reference to the memmapped object when performing complex operations. My real understanding is that you have to break the work and use arr[:] = <> for intermediate results.
Using numpy 1.8.1 and Python 3.4.1