I would use argmax in a argmax array. Also, if I skip right to numpy , I can do it very quickly.
(df.values[:, ::-1] != 0).argmax(1) array([1, 3, 0, 2])
Or very similar
(df.values[:, ::-1].astype(bool)).argmax(1) array([1, 3, 0, 2])
I can put it in a new column with assign
df.assign(new=(df.values[:, ::-1] != 0).argmax(1)) C1 C2 C3 C4 new 0 1 2 3 0 1 1 4 0 0 0 3 2 0 0 0 3 0 3 0 3 0 0 2
Or add a new column in place
df['new'] = (df.values[:, ::-1] != 0).argmax(1) df C1 C2 C3 C4 new 0 1 2 3 0 1 1 4 0 0 0 3 2 0 0 0 3 0 3 0 3 0 0 2
Timing
We reduce time by reducing the necessary work. We only need to find the position of the first nonzero.
# My first variant %timeit df.assign(new=(df.values[:, ::-1] != 0).argmax(1)) # My second variant %timeit df.assign(new=(df.values[:, ::-1].astype(bool)).argmax(1)) # jezrael solution %timeit df.assign(new=df.iloc[:,::-1].cumsum(1).eq(0).sum(1)) # numpy version of jezrael solution %timeit df.assign(new=(df.values[:,::-1].cumsum(1) == 0).sum(1)) # Scott Boston solution %timeit df.assign(new=df.iloc[:,::-1].eq(0).cumprod(axis=1).sum(axis=1)) # numpy version of Scott Boston solution %timeit df.assign(new=(df.values[:,::-1] == 0).cumprod(1).sum(1))
small data
1000 loops, best of 3: 301 µs per loop 1000 loops, best of 3: 273 µs per loop 1000 loops, best of 3: 770 µs per loop 1000 loops, best of 3: 323 µs per loop 1000 loops, best of 3: 647 µs per loop 1000 loops, best of 3: 324 µs per loop
big data
df = pd.DataFrame(np.random.choice([0, 1], (10000, 100), p=(.7, .3))) 100 loops, best of 3: 6.03 ms per loop 100 loops, best of 3: 5.3 ms per loop 100 loops, best of 3: 16.9 ms per loop 100 loops, best of 3: 9 ms per loop 100 loops, best of 3: 10.7 ms per loop 100 loops, best of 3: 10.1 ms per loop