While you can classify the step as something returning a DataFrame and accepting a DataFrame (possibly more arguments), you can use pipe
. Is there an advantage to this, this is another question.
Here, for example, you can use
df\ .pipe(lambda df_, x, y: df_[(df_.a >= x) & (df_.b <= y)], 2, 8)\ .pipe(lambda df_: df_.groupby(df_.x))\ .mean()
Note that the first step is a lambda, which takes 3 arguments, and 2 and 8 are passed as parameters. This is not the only way to do this - it is equivalent
.pipe(lambda df_: df_[(df_.a >= 2) & (df_.b <= 8)])\
Also note that you can use
df\ .pipe(lambda df_, x, y: df[(df.a >= x) & (df.b <= y)], 2, 8)\ .groupby('x')\ .mean()
Here lambda accepts df_
but works on df
and the second pipe
replaced by groupby
.
The first change works here, but it is graceful. This is because this is the first stage of the pipe. If this is a later stage, he can take a DataFrame with one dimension and try to filter it on a mask with another dimension, for example.
The second change is in order. On the face, I think it is more readable. Basically, everything that accepts a DataFrame and returns one can be called directly or via pipe
.
source share