The dask API says map_partition can be used to "apply a Python function on each section of the DataFrame." From this description and in accordance with the usual behavior of the map, I expect the return value of map_partitions to be a (sort of) list, the length of which is equal to the number of sections. Each list item must be one of the return values of function calls.
However, regarding the following code, I'm not sure if the return value depends on:
pdf = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
ddf = dd.from_pandas(pdf, npartitions=3)
VAL = pd.Series({'A': 1})
def helper(x):
print('function called\n')
return VAL
out = ddf.map_partitions(helper).compute()
print(len(out))
VAL = pd.Series({'A': 1}) calls 4 function calls (possibly to output dtype and 3 for partitions) and output with len == 3 and type pd.Series.pd.DataFrame({'A': [1]}) results in the same numbers, however the resulting type is pd.DataFrame.VAL = None TypeError... ? map_partitions -, -?VAL = 1 2 . map_partitions 1.
:
- map_partitions?
- , / ?
- , "" -, .. ?
- , ?