Expression pandas subset operation using pipe

Question

Expression pandas subset operation using pipe

Is there a way to express pandas operations below using the pipe operator?

df_a = df[df.index.year != 2000] df_b = df_a[(df_a['Month'].isin([3, 4, 5])) & (df_a['region'] == 'USA')]

0

python pandas

user308827 Jan 27 '16 at 19:01

source share

1 answer

Primer · Accepted Answer · 2016-01-28T09:12:01+0000

Not sure why you want to use pipe for this operation.

pipe intended to simplify the syntax for chaining a DataFrame with a chain of functions that modifies the incoming DataFrame ( see docs ).

What you are trying to do is a DataFrame filter with several filters (or masks).

To illustrate that using pipe for this operation is somewhat cumbersome:

 import pandas as pd pd.np.random.seed(123) # Generate some data dates = pd.date_range('2014-01-01', '2015-12-31', freq='M') df = pd.DataFrame({'region':pd.np.random.choice(['USA', 'Non-USA'], len(dates))}, index=dates) df['Month'] = df.index.month print df.head() region Month 2014-01-31 USA 1 2014-02-28 Non-USA 2 2014-03-31 USA 3 2014-04-30 USA 4 2014-05-31 USA 5

Your source filter will give:

 df_a = df[df.index.year != 2014] df_b = df_a[(df_a['Month'].isin([3, 4, 5])) & (df_a['region'] == 'USA')] print df_b region Month 2015-03-31 USA 3 2015-05-31 USA 5

Here is how you could use pipe to get the same output:

 def masker(df, mask): return df[mask] mask1 = df.index.year != 2014 mask2 = df['Month'].isin([3, 4, 5]) mask3 = df['region'] == 'USA' print df.pipe(masker, mask1).pipe(masker, mask2).pipe(masker, mask3) region Month 2015-03-31 USA 3 2015-05-31 USA 5

However, pandas is able to handle filtering in a fairly simple (in this particular case) way:

 print df[mask1 & mask2 & mask3] region Month 2015-03-31 USA 3 2015-05-31 USA 5

Expression pandas subset operation using pipe

More articles: