In Dask DataFrame.apply (), getting n rows with number 1 before processing the actual rows

In the code snippet below, I expect the logs to print numbers 0 through 4. I understand that the numbers may not be in the correct order, since the task will be split into several parallel operations.

Code snippet:

from dask import dataframe as dd
import numpy as np
import pandas as pd

df = pd.DataFrame({'A': np.arange(5),
                   'B': np.arange(5),
                   'C': np.arange(5)})

ddf = dd.from_pandas(df, npartitions=1)

def aggregate(x):
    print('B val received: ' + str(x.B))
    return x

ddf.apply(aggregate, axis=1).compute()

But when the above code is executed, I see this instead:

B val received: 1
B val received: 1
B val received: 1
B val received: 0
B val received: 0
B val received: 1
B val received: 2
B val received: 3
B val received: 4

Instead of 0 - 4, I see first the first series, and another 0. I noticed "extra" rows of value 1 that occur every time I set up a Dask DataFrame and run it applyon it.

Printing the data frame does not show additional lines with a value of 1 in everything:

   A  B  C
0  0  0  0
1  1  1  1
2  2  2  2
3  3  3  3
4  4  4  4

: 1? "" ? 1 ( , , - ).

+4
2

Dask , , . . , , Dask .

+4

@Grr . Dask.dataframe , , dask.dataframe , dtypes .., .

, , meta= ( DataFrame.apply docstring), , Dask.dataframe .

:

meta: pd.DataFrame, pd.Series, dict, iterable, tuple, optional

pd.DataFrame pd.Series, . dask dataframe. . DataFrame {name: dtype} (name, dtype). (name, dtype). , dask . , . . Dask.dataframe.utils.make_meta.

, , :

meta = pd.DataFrame({'A': [1], 'B': [2], 'C': [3]}, 
                    columns=['A', 'B', 'C'])
ddf.apply(aggregate, axis=1, meta=meta)

, , dtype , -

ddf.apply(aggregate, axis=1, meta=ddf.meta)
+6

Source: https://habr.com/ru/post/1674846/


All Articles