Calculate the average value of the nth number of elements in a column in pandas

I have the following framework:

             df1
index   year   week   a     b     c
 -10    2017    10   45    26    19
  -9    2017    11   37    23    14
  -8    2017    12   21    66    19
  -7    2017    13   47    36    92
  -6    2017    14   82    65    18
  -5    2017    15   68    68    19
  -4    2017    16   30    95    24
  -3    2017    17   21    15    94
  -2    2017    18   67    30    16
  -1    2017    19   10    13    13
   0    2017    20   26    22    18
   1    2017    21   NaN   NaN   NaN
   2    2017    22   NaN   NaN   NaN
   3    2017    23   NaN   NaN   NaN
   4    2017    24   NaN   NaN   NaN
   ...
   53   2018    20   NaN   NaN   NaN

I need each empty cell to calculate the average of the previous nth values ​​in the column and write that value to the cell. nequal to the number of indices from zero and above. For example, for the first empty cell in the column, aI have to calculate the average between the indices 0and -10. Then for the next cell between 1and -9etc. Also for columns a, band c. And the calculations always begin where index = 1.

, , a, b, c, . , week. , , week?

-, .

UPD. , index = 0 down 53.

+4
2

- , pandas numpy. , , week ( , ), , week - , -

import numpy as np
import pandas as pd
#data is your dataframe name
column_list = list(data.columns.values)[3:]
for column_name in column_list :
    column = data[column_name].values
    #converted pandas series to numpy series
    for index in xrange(0,column.shape[0]):
        #iterating over entries in the column
        if np.isnan(column[index]):
            column[index] = np.nanmean(column.take(range(index-10,index+1),mode='wrap'))

, . NaN 10 , . 10 , n n 10, , new_df[index] = np.nanmean(new_df[max(0,index-10):index+1])

, !

+1

:

n = 11 # in the example of your explanation
df = df1.loc[range(1,df1.index[-1]+1)] # select rows from index 1 above

df :

       year  week   a   b   c
index                        
1      2017    21 NaN NaN NaN
2      2017    22 NaN NaN NaN
3      2017    23 NaN NaN NaN
4      2017    24 NaN NaN NaN

:

for s in list(df.index): # iterate through rows with nan values
    for i in range(2,df.columns.size): # iterate through different cols ('a','b','c' or more)
        df1.loc[s,df.columns[i]] = df1.loc[range(s-n,s),df.columns[i]].sum()/n
print(df1)

, , year week , week index...

:

       year  week          a          b          c
index                                             
-10    2017    10  45.000000  26.000000  19.000000
-9     2017    11  37.000000  23.000000  14.000000
-8     2017    12  21.000000  66.000000  19.000000
-7     2017    13  47.000000  36.000000  92.000000
-6     2017    14  82.000000  65.000000  18.000000
-5     2017    15  68.000000  68.000000  19.000000
-4     2017    16  30.000000  95.000000  24.000000
-3     2017    17  21.000000  15.000000  94.000000
-2     2017    18  67.000000  30.000000  16.000000
-1     2017    19  10.000000  13.000000  13.000000
 0     2017    20  26.000000  22.000000  18.000000
 1     2017    21  41.272727  41.727273  31.454545
 2     2017    22  40.933884  43.157025  32.586777
 3     2017    23  41.291510  44.989482  34.276484
 4     2017    24  43.136193  43.079434  35.665255
+2

Source: https://habr.com/ru/post/1680712/


All Articles