Efficient way to partially read a large numpy file?

I have a huge three-dimensional tensor that is stored in a file on my disk (which I usually read with np.load ). This is a binary .npy file. When using np.load I quickly use most of my memory.

Fortunately, every time I run the program, I only need a certain piece of the huge tensor. The slice has a fixed size and its dimensions are provided from the external module.

What is the best way to do this? The only way I could understand is to somehow save this numpy matrix in a MySQL database. But I'm sure there are much better / simpler ways. I will also be happy to build my 3D tensor file differently if that helps.


Does the answer change if my tensor is sparse in nature?

+6
source share
1 answer

use numpy.load as usual, but be sure to specify the mmap_mode keyword so that the array is stored on disk and only the necessary bits are loaded into memory upon access.

mmap_mode: {No, 'r +,' r, 'w +,' c}, optional If not None, then a memory-map file using this mode (see numpy.memmap for a detailed description of the modes). An array with memory mapping is stored disk. However, it can be obtained and chopped like any ndarray. Memory mapping is especially useful for accessing small fragments of large files without reading the entire file in memory.

Descirbed modes in numpy.memmap :

: {'r +,' r, 'w +,' c}, optional The file opens in this mode: 'r Open an existing file for reading only. 'R + Open an existing file for reading and writing. 'W + Create or overwrite an existing file for reading and writing. 'C Copy-on-write: assignments affect data in memory, but changes are not saved to disk. The file on the disk is read-only.

* Be sure not to use the "w +" mode, as it erases the contents of your file.

+12
source

Source: https://habr.com/ru/post/914149/


All Articles