Moving a function on a data frame

Question

Moving a function on a data frame

I have the following C data frame.

 >>> C abc 2011-01-01 0 0 NaN 2011-01-02 41 12 NaN 2011-01-03 82 24 NaN 2011-01-04 123 36 NaN 2011-01-05 164 48 NaN 2011-01-06 205 60 2 2011-01-07 246 72 4 2011-01-08 287 84 6 2011-01-09 328 96 8 2011-01-10 369 108 10

I would like to add a new column d , where I apply the rolling function, in a fixed window (here 6), where I somehow fix the value C for each row (or date), One loop in this rolling function should be (pseudo) :

  abcd 2011-01-01 0 0 NaN a + b*2 (a,b from this row, '2' is from 'c' on 2011-01-06) 2011-01-02 41 12 NaN a + b*2 (a,b from this row, '2' is still from 2011-01-06) 2011-01-03 82 24 NaN a + b*2 2011-01-04 123 36 NaN a + b*2 2011-01-05 164 48 NaN a + b*2 2011-01-06 205 60 2 a + b*2 2011-01-07 246 72 4 2011-01-08 287 84 6 2011-01-09 328 96 8 2011-01-10 369 108 10

After this “loop” I want to take all these 6 calculated lines in d and start a function call, which in turn will return one value, which should be stored in another column, e say:

  abcde 2011-01-01 0 0 NaN a + b*2 ---| NaN 2011-01-02 41 12 NaN a + b*2 | NaN 2011-01-03 82 24 NaN a + b*2 | These values NaN 2011-01-04 123 36 NaN a + b*2 | are input to NaN 2011-01-05 164 48 NaN a + b*2 | function NaN 2011-01-06 205 60 2 a + b*2 ---| yielding X 2011-01-07 246 72 4 value X in 2011-01-08 287 84 6 column 'e' 2011-01-09 328 96 8 2011-01-10 369 108 10

Then this procedure will be repeated in the next window (again 6 long), for example:

  abcde 2011-01-01 0 0 NaN 2011-01-02 41 12 NaN a + b*4 (a,b from this row, '4' is from 'c' now from 2011-01-07) 2011-01-03 82 24 NaN a + b*4 (a,b from this row, '4' is still from 2011-01-07) 2011-01-04 123 36 NaN a + b*4 2011-01-05 164 48 NaN a + b*4 2011-01-06 205 60 2 a + b*4 X 2011-01-07 246 72 4 a + b*4 2011-01-08 287 84 6 2011-01-09 328 96 8 2011-01-10 369 108 10 abcde 2011-01-01 0 0 NaN NaN 2011-01-02 41 12 NaN a + b*4 ---| NaN 2011-01-03 82 24 NaN a + b*4 | These values NaN 2011-01-04 123 36 NaN a + b*4 | are input to NaN 2011-01-05 164 48 NaN a + b*4 | function NaN 2011-01-06 205 60 2 a + b*4 | yielding X 2011-01-07 246 72 4 a + b*4 ---| value Y in Y 2011-01-08 287 84 6 column 'e' 2011-01-09 328 96 8 2011-01-10 369 108 10

Hope it's clear enough

Thanks N

+6

python pandas dataframe apply

gussilago Jan 28 '15 at 10:52

source share

1 answer

unutbu · Accepted Answer · 2015-01-28T15:00:28+0000

You can use pd.rolling_apply :

 import numpy as np import pandas as pd df = pd.read_table('data', sep='\s+') def foo(x, df): window = df.iloc[x] # print(window) c = df.ix[int(x[-1]), 'c'] dvals = window['a'] + window['b']*c return bar(dvals) def bar(dvals): # print(dvals) return dvals.mean() df['e'] = pd.rolling_apply(np.arange(len(df)), 6, foo, args=(df,)) print(df)

gives

  abce 2011-01-01 0 0 NaN NaN 2011-01-02 41 12 NaN NaN 2011-01-03 82 24 NaN NaN 2011-01-04 123 36 NaN NaN 2011-01-05 164 48 NaN NaN 2011-01-06 205 60 2 162.5 2011-01-07 246 72 4 311.5 2011-01-08 287 84 6 508.5 2011-01-09 328 96 8 753.5 2011-01-10 369 108 10 1046.5

The args and kwargs parameters were added to rolling_apply in Pandas version 0.14.0 .

Since in my example above df is a global variable, it is not necessary to pass it to foo as an argument. You can simply remove df from the line def foo , and also omit args=(df,) in the rolling_apply call.

However, there are times when df cannot be defined in the area accessible with foo . In this case, there is a simple way - to make a closure:

 def foo(df): def inner_foo(x): window = df.iloc[x] # print(window) c = df.ix[int(x[-1]), 'c'] dvals = window['a'] + window['b']*c return bar(dvals) return inner_foo df['e'] = pd.rolling_apply(np.arange(len(df)), 6, foo(df))

Moving a function on a data frame

More articles: