you want to take the cumulative amount of data_binary and subtract the most recent total amount, where data_binary is zero.
b = df.data_binary c = b.cumsum() c.sub(c.mask(b != 0).ffill(), fill_value=0).astype(int) 0 1 1 0 2 1 3 2 4 3 5 0 6 0 7 1 Name: data_binary, dtype: int64
Explanation
Let's start by looking at each step side by side.
cols = ['data_binary', 'cumulative_sum', 'nan_non_zero', 'forward_fill', 'final_result'] print(pd.concat([ b, c, c.mask(b != 0), c.mask(b != 0).ffill(), c.sub(c.mask(b != 0).ffill(), fill_value=0).astype(int) ], axis=1, keys=cols)) data_binary cumulative_sum nan_non_zero forward_fill final_result 0 1 1 NaN NaN 1 1 0 1 1.0 1.0 0 2 1 2 NaN 1.0 1 3 1 3 NaN 1.0 2 4 1 4 NaN 1.0 3 5 0 4 4.0 4.0 0 6 0 4 4.0 4.0 0 7 1 5 NaN 4.0 1
The problem with cumulative_sum is that the rows where data_binary is zero do not reset the sum. And this is the motivation for this decision. How do we βresetβ the amount when data_binary is zero? Easy! I slice the cumulative sum, where data_binary is zero, and forward - fill in the values. When I take the difference between this and the total amount, I effectively reset the amount.
source share