How to calculate the number of consecutive columns with zero values ​​on the right until the first non-zero element occurs

Suppose I have the following data file:

C1 C2 C3 C4 0 1 2 3 0 1 4 0 0 0 2 0 0 0 3 3 0 3 0 0 

Then I want to add another column so that it displays the number of columns with zero rating that are adjacent offsets to the right. The new column will look like this:

  Cnew 0 1 1 3 2 0 3 2 
+5
source share
3 answers

You can use:

  • reverse order iloc and [::-1]
  • get cumsum per line ( axis=1 )
  • check eq and get sum of True s

 df['new'] = df.iloc[:,::-1].cumsum(axis=1).eq(0).sum(axis=1) print (df) C1 C2 C3 C4 new 0 1 2 3 0 1 1 4 0 0 0 3 2 0 0 0 3 0 3 0 3 0 0 2 
 print (df.iloc[:,::-1]) C4 C3 C2 C1 0 0 3 2 1 1 0 0 0 4 2 3 0 0 0 3 0 0 3 0 print (df.iloc[:,::-1].cumsum(axis=1)) C4 C3 C2 C1 0 0 3 5 6 1 0 0 0 4 2 3 3 3 3 3 0 0 3 3 print (df.iloc[:,::-1].cumsum(axis=1).eq(0)) C4 C3 C2 C1 0 True False False False 1 True True True False 2 False False False False 3 True True False False 
+5
source

I would use argmax in a argmax array. Also, if I skip right to numpy , I can do it very quickly.

 (df.values[:, ::-1] != 0).argmax(1) array([1, 3, 0, 2]) 

Or very similar

 (df.values[:, ::-1].astype(bool)).argmax(1) array([1, 3, 0, 2]) 

I can put it in a new column with assign

 df.assign(new=(df.values[:, ::-1] != 0).argmax(1)) C1 C2 C3 C4 new 0 1 2 3 0 1 1 4 0 0 0 3 2 0 0 0 3 0 3 0 3 0 0 2 

Or add a new column in place

 df['new'] = (df.values[:, ::-1] != 0).argmax(1) df C1 C2 C3 C4 new 0 1 2 3 0 1 1 4 0 0 0 3 2 0 0 0 3 0 3 0 3 0 0 2 

Timing
We reduce time by reducing the necessary work. We only need to find the position of the first nonzero.

 # My first variant %timeit df.assign(new=(df.values[:, ::-1] != 0).argmax(1)) # My second variant %timeit df.assign(new=(df.values[:, ::-1].astype(bool)).argmax(1)) # jezrael solution %timeit df.assign(new=df.iloc[:,::-1].cumsum(1).eq(0).sum(1)) # numpy version of jezrael solution %timeit df.assign(new=(df.values[:,::-1].cumsum(1) == 0).sum(1)) # Scott Boston solution %timeit df.assign(new=df.iloc[:,::-1].eq(0).cumprod(axis=1).sum(axis=1)) # numpy version of Scott Boston solution %timeit df.assign(new=(df.values[:,::-1] == 0).cumprod(1).sum(1)) 

small data

 1000 loops, best of 3: 301 µs per loop 1000 loops, best of 3: 273 µs per loop 1000 loops, best of 3: 770 µs per loop 1000 loops, best of 3: 323 µs per loop 1000 loops, best of 3: 647 µs per loop 1000 loops, best of 3: 324 µs per loop 

big data

 df = pd.DataFrame(np.random.choice([0, 1], (10000, 100), p=(.7, .3))) 100 loops, best of 3: 6.03 ms per loop 100 loops, best of 3: 5.3 ms per loop 100 loops, best of 3: 16.9 ms per loop 100 loops, best of 3: 9 ms per loop 100 loops, best of 3: 10.7 ms per loop 100 loops, best of 3: 10.1 ms per loop 
+4
source

Use eq , cumprod and sum (this is very similar to the question here .)

 df.iloc[:,::-1].eq(0).cumprod(axis=1).sum(axis=1) 

Output:

 0 1 1 3 2 0 3 2 dtype: int64 
+3
source

Source: https://habr.com/ru/post/1270109/


All Articles