As I understand it, a Dask DataFrame is the right way to handle tabular data. I have a table in PostgreSQL and I know how to load it into pandas.Dataframe.
I know it odocan be used to convert pandas.Dataframeto dask.dataframe. But This is not a lazy operation: such a conversion power loads the entire PostgeSQL table into memory, and this is bad. I prefer to read objects one by one or pieces. How to do it?
- A similar problem with Cassandra. But Cassandra is like distributed storage and can be optimized for distributed access. But how to do it with Dask?
source
share