How to load a GRIB file directory into a Dask array

Suppose I have a directory with thousands of GRIB files. I want to load these files into a dask array so that I can request them. How can i do this? This attempt seems to work, but it requires every GRIB file to be open, and it takes a lot of time and all my memory to execute it. There must be a better way.

My attempt:

import dask.array as da
from dask import delayed
import gdal
import glob
import os


def load(filedir):
    files = sorted(glob.glob(os.path.join(filedir, '*.grb')))
    data = [da.from_array(gdal.Open(f).ReadAsArray(), chunks=[500,500,500], name=f) for f in files]
    return da.stack(data, axis=0)

file_dir = ...
array = load(file_dir)
+4
source share
1 answer

The best way to do this is to use dask.delayed. In this case, you must create a delayed function to read the array, and then compose an array of objects from these objects delayedusing da.from_delayed. Sort of:

# This function isn't run until compute time
@dask.delayed(pure=True)
def load(file):
    return gdal.Open(file).ReadAsArray()


# Create several delayed objects, then turn each into a dask
# array. Note that you need to know the shape and dtype of each
# file
data = [da.from_delayed(load(f), shape=shape_of_f, dtype=dtype_of_f)
        for f in files]

x = da.stack(data, axis=0)

, . , load. gdal, ReadAsArray xoff/yoff/xsize/ysize ( ). , .

, rechunk, . , . , .

x = x.rechunk((500, 500, 500))  # or whatever chunks you want
+4

Source: https://habr.com/ru/post/1676677/


All Articles