Optimization of iterative calculation of values based on growth rate

Question

Optimization of iterative calculation of values based on growth rate

This is my data file:

Date A new_growth_rate 2011/01/01 100 2011/02/01 101 . 2012/01/01 120 0.035 2012/02/01 121 0.035 . 2013/01/01 131 0.036 2013/01/01 133 0.038

This is what I need:

 Date A new_growth_rate 2011/01/01 100 2011/02/01 101 . . 2012/01/01 103.62 .035 A=100/(1-0.035) 2012/02/01 104.66 .035 A=101/(1-0.035) . . 2013/01/01 107.49 .036 A=103.62/(1-0.036) 2013/02/01 108.68 .038 A=104.66/(1-0.038)

I need to calculate a value based on the growth rate for each column. I have a dataframe with 400 columns and their corresponding growth rates.

I calculated the growth rate using the following formula: (one year old value)*(1+current month growth rate) . this calculated value will be used to get the value of the next year, etc. For example, I have 400 columns and their corresponding growth rate. The time series contains 30 years of data

I am currently using 2 for a loop to get each column, and then a second for iterating over the time period for each column and getting the values calculated in the previous loop. It takes several hours to go through more than 500 rows and 400 columns. Is there a better way to do this?

My code snippet is below:

grpby = list of columns in dataframe

 df_new=pd.DataFrame() for i,row in grpby.iterrows(): df_csr=grwth.loc[(grwth['A']==row['A'])].copy() a = pd.to_datetime("2011-12-01",format='%Y-%m-%d') b = a while b <a+relativedelta.relativedelta(months=420): b=b+relativedelta.relativedelta(months=1) val= df_csr.loc[df_csr['Date']==(b+relativedelta.relativedelta(months=-12))].copy() val2=val.get_value(val.index[0],'Val') grwth_r=df_csr.loc[df_csr['date']==b]['new_growth_rate'].copy() grwth_r2=grwth_r.get_value(grwth_r.index[0],'new_growth_rate') df_csr.loc[df_csr['Date']==b,'Val']=val2/(1-grwth_r2) df_new=pd.concat([df_new,df_csr])

+6

python pandas dataframe

Sanjay Nov 30 '16 at 4:36

source share

1 answer

Dark · Answer 1 · 2017-08-09T18:11:16+0000

You can use the year value as an index, and then use a simple loop to assign ie data

 df['Date'] = pd.to_datetime(df['Date']) df = df.set_index('Date') years = (df.index.year).unique() for i,j in enumerate(years): if i != 0: prev = df.loc[df.index.year == years[i-1]] curr = df.loc[df.index.year == j] df.loc[df.index.year == j,'A'] = prev['A'].values/(1-curr['new_growth_rate'].values)

Exit:

  A new_growth_rate
 Date                                   
 2011-01-01 100.000000 NaN
 2011-02-01 101.000000 NaN
 2012-01-01 103.626943 0.035
 2012-02-01 104.663212 0.035
 2013-01-01 107.496829 0.036
 2013-01-01 108.797518 0.038

Hope this helps

Optimization of iterative calculation of values ​​based on growth rate

More articles:

Optimization of iterative calculation of values based on growth rate