Vectorized calculation of a column value based on a previous value of the same column?

I have a pandas dataframe with two columns A , B as shown below.

I want a vectorized solution to create a new column C, where C[i] = C[i-1] - A[i] + B[i] .

 df = pd.DataFrame(data={'A': [10, 2, 3, 4, 5, 6], 'B': [0, 1, 2, 3, 4, 5]}) >>> df AB 0 10 0 1 2 1 2 3 2 3 4 3 4 5 4 5 6 5 

Here is a solution using for-loop:

 df['C'] = df['A'] for i in range(1, len(df)): df['C'][i] = df['C'][i-1] - df['A'][i] + df['B'][i] >>> df ABC 0 10 0 10 1 2 1 9 2 3 2 8 3 4 3 7 4 5 4 6 5 6 5 5 

... which does the job.

But since loops are slower compared to vectorized calculations, I want a vectorized solution for this in pandas :

I tried using the shift() method as follows:

 df['C'] = df['C'].shift(1).fillna(df['A']) - df['A'] + df['B'] 

but this did not help, since the offset column C is not updated in the calculation. It retains its original values:

 >>> df['C'].shift(1).fillna(df['A']) 0 10 1 10 2 2 3 3 4 4 5 5 

and it gives the wrong result.

+5
source share
1 answer

This can be vectorized since:

  • delta[i] = C[i] - C[i-1] = -A[i] +B[i] . First you can get delta from A and B , then ...
  • calculate accumulated delta (plus C[0] ) to get full C

The code is as follows:

 delta = df['B'] - df['A'] delta[0] = 0 df['C'] = df.loc[0, 'A'] + delta.cumsum()​ print df ABC 0 10 0 10 1 2 1 9 2 3 2 8 3 4 3 7 4 5 4 6 5 6 5 5 
+6
source

Source: https://habr.com/ru/post/1239307/


All Articles