Pandas roll and ewm to completely ignore na and use the last N valid data

Question

Pandas roll and ewm to completely ignore na and use the last N valid data

Relatively new methods .rollingand .ewm. I am using pandas 0.19.0.

>>> df = pd.DataFrame({'A' : [1,2,np.nan, 3, 4, 5], 'B' : [1,2,3,np.nan, 4, 5]})
>>> df
     A    B
0  1.0  1.0
1  2.0  2.0
2  NaN  3.0
3  3.0  NaN
4  4.0  4.0
5  5.0  5.0

>>> df.rolling(window = 3).mean()
     A    B
0  NaN  NaN
1  NaN  NaN
2  NaN  2.0
3  NaN  NaN
4  NaN  NaN
5  4.0  NaN

The desired conclusion is to completely ignore the nan, use the last 3 valid data and leave the nan where it is.

     A    B
0  NaN  NaN
1  NaN  NaN # first two we don't have enough data
2  NaN  2.0 # B column is valid
3  2.0  NaN # completely ignore the nan in df.ix[2,'A'], take the mean of last 3 valid data
4  3.0  3.0
5  4.0  4.0

In .ewmwe have a parameter ignore_na. And the code below gets what I want

output = df.ewm(com=2, ignore_na=True).mean()
output[df.isnull()] = np.nan

+4

python pandas

jf328 Dec 6 '16 at 11:53

source share

1 answer

piRSquared · Accepted Answer · 2016-12-06T12:35:44+0000

the tricky part is that you need the last 3 valid data points.

df.apply(lambda x: x.dropna().rolling(3).mean().reindex(x.index))

Pandas roll and ewm to completely ignore na and use the last N valid data

More articles: