How to count the longest continuous sequence in pandas

Say I have pd.Seriesas below

s = pd.Series([False, True, False,True,True,True,False, False])    

0    False
1     True
2    False
3     True
4     True
5     True
6    False
7    False
dtype: bool

I want to know how long a long sequence is True, in this example it is 3.

I tried this stupid.

s_list = s.tolist()
count = 0
max_count = 0
for item in s_list:
    if item:
        count +=1
    else:
        if count>max_count:
            max_count = count
        count = 0
print(max_count)

He will print 3, but in Seriesall Truehe will print0

+4
source share
6 answers

Option 1
Use a series to mask the cumulative amount of negation. Then usevalue_counts

(~s).cumsum()[s].value_counts().max()

3

explanation

  • (~s).cumsum()- a fairly standard way to create separate groups True/False

    0    1
    1    1
    2    2
    3    2
    4    2
    5    2
    6    3
    7    4
    dtype: int64
    
  • , , , ​​ 2, . , False ( True (~s)). , .

    (~s).cumsum()[s]
    
    1    1
    3    2
    4    2
    5    2
    dtype: int64
    
  • 2, . value_counts max.


2
factorize bincount

a = s.values
b = pd.factorize((~a).cumsum())[0]
np.bincount(b[a]).max()

3


, 1. , max. pd.factorize 0 . , (~a).cumsum(), . , , .

pd.factorize np.bincount, , . .


3
2, :

a = s.values
np.bincount((~a).cumsum()[a]).max()

3
+7

,

pd.Series(s.index[~s].values).diff().max()-1
Out[57]: 3.0

pandas ' python

from itertools import groupby
max([len(list(group)) for key, group in groupby(s.tolist())])
Out[73]: 3

:

from itertools import compress
max(list(compress([len(list(group)) for key, group in groupby(s.tolist())],[key for key, group in groupby(s.tolist())])))
Out[84]: 3
+4

. piRSquared, False . piRSquared .

(np.diff(np.flatnonzero(np.append(True, np.append(~s.values, True)))) - 1).max()

(np.diff(s.where(~s).dropna().index.values) - 1).max()

( , True , piRSquared. , , piRSquared. .)

:

False , False, True.

  • s.where(s == False).dropna().index.values False

    array([0, 2, 6, 7])
    

, True False s. , np.diff, .

    array([2, 4, 1])
  • 1 True .

  • .

+2

( @piRSquared):

s.groupby((~s).cumsum()).sum().max()
Out[513]: 3.0

Another option is to use the lambda function for this.

s.to_frame().apply(lambda x: s.loc[x.name:].idxmin() - x.name, axis=1).max()
Out[429]: 3
+2
source

I'm not quite sure how to do this with pandas, but what about using itertools.groupby?

>>> import pandas as pd
>>> s = pd.Series([False, True, False,True,True,True,False, False])
>>> max(sum(1 for _ in g) for k, g in groupby(s) if k)
3
+1
source

Your code really was very close. It becomes perfect with a minor fix:

count = 0
maxCount = 0
for item in s:
    if item:
        count += 1
        if count > maxCount:
            maxCount = count
    else:
        count = 0
print(maxCount)
+1
source

Source: https://habr.com/ru/post/1693891/


All Articles