How to calculate the number of consecutive columns with zero values on the right until the first non-zero element occurs

Question

How to calculate the number of consecutive columns with zero values on the right until the first non-zero element occurs

Suppose I have the following data file:

C1 C2 C3 C4 0 1 2 3 0 1 4 0 0 0 2 0 0 0 3 3 0 3 0 0

Then I want to add another column so that it displays the number of columns with zero rating that are adjacent offsets to the right. The new column will look like this:

  Cnew 0 1 1 3 2 0 3 2

+5

python pandas sum reverse cumsum

Alex_ban Jul 23 '17 at 4:25

source share

3 answers

I would use argmax in a argmax array. Also, if I skip right to numpy , I can do it very quickly.

 (df.values[:, ::-1] != 0).argmax(1) array([1, 3, 0, 2])

Or very similar

 (df.values[:, ::-1].astype(bool)).argmax(1) array([1, 3, 0, 2])

I can put it in a new column with assign

 df.assign(new=(df.values[:, ::-1] != 0).argmax(1)) C1 C2 C3 C4 new 0 1 2 3 0 1 1 4 0 0 0 3 2 0 0 0 3 0 3 0 3 0 0 2

Or add a new column in place

 df['new'] = (df.values[:, ::-1] != 0).argmax(1) df C1 C2 C3 C4 new 0 1 2 3 0 1 1 4 0 0 0 3 2 0 0 0 3 0 3 0 3 0 0 2

Timing
We reduce time by reducing the necessary work. We only need to find the position of the first nonzero.

 # My first variant %timeit df.assign(new=(df.values[:, ::-1] != 0).argmax(1)) # My second variant %timeit df.assign(new=(df.values[:, ::-1].astype(bool)).argmax(1)) # jezrael solution %timeit df.assign(new=df.iloc[:,::-1].cumsum(1).eq(0).sum(1)) # numpy version of jezrael solution %timeit df.assign(new=(df.values[:,::-1].cumsum(1) == 0).sum(1)) # Scott Boston solution %timeit df.assign(new=df.iloc[:,::-1].eq(0).cumprod(axis=1).sum(axis=1)) # numpy version of Scott Boston solution %timeit df.assign(new=(df.values[:,::-1] == 0).cumprod(1).sum(1))

small data

 1000 loops, best of 3: 301 µs per loop 1000 loops, best of 3: 273 µs per loop 1000 loops, best of 3: 770 µs per loop 1000 loops, best of 3: 323 µs per loop 1000 loops, best of 3: 647 µs per loop 1000 loops, best of 3: 324 µs per loop

big data

 df = pd.DataFrame(np.random.choice([0, 1], (10000, 100), p=(.7, .3))) 100 loops, best of 3: 6.03 ms per loop 100 loops, best of 3: 5.3 ms per loop 100 loops, best of 3: 16.9 ms per loop 100 loops, best of 3: 9 ms per loop 100 loops, best of 3: 10.7 ms per loop 100 loops, best of 3: 10.1 ms per loop

+4

piRSquared Jul 23 '17 at 6:28

source share

Use eq , cumprod and sum (this is very similar to the question here .)

 df.iloc[:,::-1].eq(0).cumprod(axis=1).sum(axis=1)

Output:

 0 1 1 3 2 0 3 2 dtype: int64

+3

Scott boston Jul 23 '17 at 5:33

source share

jezrael · Accepted Answer · 2017-07-23T04:28:29+0000

You can use:

reverse order iloc and [::-1]
get cumsum per line ( axis=1 )
check eq and get sum of True s

 df['new'] = df.iloc[:,::-1].cumsum(axis=1).eq(0).sum(axis=1) print (df) C1 C2 C3 C4 new 0 1 2 3 0 1 1 4 0 0 0 3 2 0 0 0 3 0 3 0 3 0 0 2

 print (df.iloc[:,::-1]) C4 C3 C2 C1 0 0 3 2 1 1 0 0 0 4 2 3 0 0 0 3 0 0 3 0 print (df.iloc[:,::-1].cumsum(axis=1)) C4 C3 C2 C1 0 0 3 5 6 1 0 0 0 4 2 3 3 3 3 3 0 0 3 3 print (df.iloc[:,::-1].cumsum(axis=1).eq(0)) C4 C3 C2 C1 0 True False False False 1 True True True False 2 False False False False 3 True True False False

How to calculate the number of consecutive columns with zero values ​​on the right until the first non-zero element occurs

More articles:

How to calculate the number of consecutive columns with zero values on the right until the first non-zero element occurs