What is map_partitions?

Question

What is map_partitions?

The dask API says map_partition can be used to "apply a Python function on each section of the DataFrame." From this description and in accordance with the usual behavior of the map, I expect the return value of map_partitions to be a (sort of) list, the length of which is equal to the number of sections. Each list item must be one of the return values of function calls.

However, regarding the following code, I'm not sure if the return value depends on:

#generate example dataframe
pdf = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
ddf = dd.from_pandas(pdf, npartitions=3)

#define helper function for map. VAL is the return value
VAL = pd.Series({'A': 1})
#VAL = pd.DataFrame({'A': [1]}) #other return values used in this example
#VAL = None
#VAL = 1
def helper(x):
    print('function called\n')
    return VAL

#check result
out = ddf.map_partitions(helper).compute()
print(len(out))

VAL = pd.Series({'A': 1}) calls 4 function calls (possibly to output dtype and 3 for partitions) and output with len == 3 and type pd.Series.
pd.DataFrame({'A': [1]}) results in the same numbers, however the resulting type is pd.DataFrame.
VAL = None TypeError... ? map_partitions -, -?
VAL = 1 2 . map_partitions 1.

:

map_partitions?
, / ?
, "" -, .. ?
, ?

+3

python pandas dask

Arco Bast 29 . '16 21:38

1

MRocklin · Accepted Answer · 2016-08-30T12:26:27+0000

Dask DataFrame.map_partitions Dash Dataframe Series . . API.

map_partitions?
. API, .
, / ?
, , dtypes/columns . , meta= . , .
, "- ", .. ?
. dask.delayed , .
, ?
/dataframes, dataframe dask.delayed DataFrame.to_delayed.

What is map_partitions?

More articles: