Pandas subset expression using pipe

I have a dataframe, which I multiply as follows:

abxy 0 1 2 3 -1 1 2 4 6 -2 2 3 6 6 -3 3 4 8 3 -4 df = df[(df.a >= 2) & (df.b <= 8)] df = df.groupby(df.x).mean() 

How to express this using the pandas pipe operator?

 df = (df .pipe((xa > 2) & (xb < 6) .groupby(df.x) .apply(lambda x: x.mean()) 
+5
source share
3 answers

While you can classify the step as something returning a DataFrame and accepting a DataFrame (possibly more arguments), you can use pipe . Is there an advantage to this, this is another question.

Here, for example, you can use

 df\ .pipe(lambda df_, x, y: df_[(df_.a >= x) & (df_.b <= y)], 2, 8)\ .pipe(lambda df_: df_.groupby(df_.x))\ .mean() 

Note that the first step is a lambda, which takes 3 arguments, and 2 and 8 are passed as parameters. This is not the only way to do this - it is equivalent

  .pipe(lambda df_: df_[(df_.a >= 2) & (df_.b <= 8)])\ 

Also note that you can use

 df\ .pipe(lambda df_, x, y: df[(df.a >= x) & (df.b <= y)], 2, 8)\ .groupby('x')\ .mean() 

Here lambda accepts df_ but works on df and the second pipe replaced by groupby .

  • The first change works here, but it is graceful. This is because this is the first stage of the pipe. If this is a later stage, he can take a DataFrame with one dimension and try to filter it on a mask with another dimension, for example.

  • The second change is in order. On the face, I think it is more readable. Basically, everything that accepts a DataFrame and returns one can be called directly or via pipe .

+2
source

You can try, but I think it is harder:

 print df[(df.a >= 2) & (df.b <= 8)].groupby(df.x).mean() abxy x 3 4.0 8 3 -4.0 6 2.5 5 6 -2.5 def masker(df, mask): return df[mask] mask1 = (df.a >= 2) mask2 = (df.b <= 8) print df.pipe(masker, mask1).pipe(masker, mask2).groupby(df.x).mean() abxy x 3 4.0 8 3 -4.0 6 2.5 5 6 -2.5 
+1
source

I believe this method is clear regarding your filtering steps and subsequent operations. Using loc[(mask1) & (mask2)] is probably more indicative.

 >>> (df .pipe(lambda x: x.loc[xa >= 2]) .pipe(lambda x: x.loc[xb <= 8]) .pipe(pd.DataFrame.groupby, 'x') .mean() ) aby x 3 4.0 8 -4.0 6 2.5 5 -2.5 

As an alternative:

 (df .pipe(lambda x: x.loc[xa >= 2]) .pipe(lambda x: x.loc[xb <= 8]) .groupby('x') .mean() ) 
+1
source

Source: https://habr.com/ru/post/1244062/


All Articles