Optimization of iterative calculation of values ​​based on growth rate

This is my data file:

Date A new_growth_rate 2011/01/01 100 2011/02/01 101 . 2012/01/01 120 0.035 2012/02/01 121 0.035 . 2013/01/01 131 0.036 2013/01/01 133 0.038 

This is what I need:

 Date A new_growth_rate 2011/01/01 100 2011/02/01 101 . . 2012/01/01 103.62 .035 A=100/(1-0.035) 2012/02/01 104.66 .035 A=101/(1-0.035) . . 2013/01/01 107.49 .036 A=103.62/(1-0.036) 2013/02/01 108.68 .038 A=104.66/(1-0.038) 

I need to calculate a value based on the growth rate for each column. I have a dataframe with 400 columns and their corresponding growth rates.

I calculated the growth rate using the following formula: (one year old value)*(1+current month growth rate) . this calculated value will be used to get the value of the next year, etc. For example, I have 400 columns and their corresponding growth rate. The time series contains 30 years of data

I am currently using 2 for a loop to get each column, and then a second for iterating over the time period for each column and getting the values ​​calculated in the previous loop. It takes several hours to go through more than 500 rows and 400 columns. Is there a better way to do this?

My code snippet is below:

grpby = list of columns in dataframe

 df_new=pd.DataFrame() for i,row in grpby.iterrows(): df_csr=grwth.loc[(grwth['A']==row['A'])].copy() a = pd.to_datetime("2011-12-01",format='%Y-%m-%d') b = a while b <a+relativedelta.relativedelta(months=420): b=b+relativedelta.relativedelta(months=1) val= df_csr.loc[df_csr['Date']==(b+relativedelta.relativedelta(months=-12))].copy() val2=val.get_value(val.index[0],'Val') grwth_r=df_csr.loc[df_csr['date']==b]['new_growth_rate'].copy() grwth_r2=grwth_r.get_value(grwth_r.index[0],'new_growth_rate') df_csr.loc[df_csr['Date']==b,'Val']=val2/(1-grwth_r2) df_new=pd.concat([df_new,df_csr]) 
+6
source share
1 answer

You can use the year value as an index, and then use a simple loop to assign ie data

 df['Date'] = pd.to_datetime(df['Date']) df = df.set_index('Date') years = (df.index.year).unique() for i,j in enumerate(years): if i != 0: prev = df.loc[df.index.year == years[i-1]] curr = df.loc[df.index.year == j] df.loc[df.index.year == j,'A'] = prev['A'].values/(1-curr['new_growth_rate'].values) 

Exit:

  A new_growth_rate
 Date                                   
 2011-01-01 100.000000 NaN
 2011-02-01 101.000000 NaN
 2012-01-01 103.626943 0.035
 2012-02-01 104.663212 0.035
 2013-01-01 107.496829 0.036
 2013-01-01 108.797518 0.038

Hope this helps

+1
source

Source: https://habr.com/ru/post/1012746/


All Articles