I have a calculation that expects the pandas framework as an input frame. I would like to run this calculation for data stored in a netCDF file that extends up to 51 GB. I am currently opening a file with xarray.open_dataset
and using chunks (I understand that this open file is actually a dask array, so it only loads chunks of data into memory at a time). However, I don't seem to be able to take advantage of this lazy loading, since I need to convert the xarray data to the pandas framework in order to perform my calculation, and I understand that at that moment all the data is loaded into memory (which is bad).
So, I think the long story is short, my question is: how can I get from the xarray dataset into a pandas dataframe without any intermediate steps that load all my data into memory? I saw how dask works with pandas.read_csv
, and I see that it works with xarray, but I'm not sure how to convert the already open xarray netCDF dataset to the pandas framework in pieces.
Thank you and sorry for the vague question!
source share