Expectation: I would expect that when I split a given data frame, the rows would be roughly evenly distributed across each section. Then I would expect that when I write a dataframe in csv, the resulting n csvs (in this case 10) would similarly have approximately equal length.
Reality: when I run the code below, I find that instead of some uniform distribution of the lines, all lines are in export_results-0.csv, and the remaining 9 csvs are empty.
Question: Are there any additional configurations that I need to set to ensure that the rows are distributed among all sections?
from dask.distributed import Client
import dask.dataframe as dd
import pandas as pd
client = Client('tcp://10.0.0.60:8786')
df = pd.DataFrame({'geom': np.random.random(1000)}, index=np.arange(1000))
sd = dd.from_pandas(df, npartitions=100)
tall = dd.merge(sd.assign(key=0), sd.assign(key=0), on='key').drop('key', axis=1)
tall.to_csv('export_results-*.csv').compute()
: 1000 , 100 000 ( , , 100k +).