I have the following pandas DataFrame:
import pandas as pd import numpy as np df = pd.DataFrame({"first_column": [0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0]}) >>> df first_column 0 0 1 0 2 0 3 1 4 1 5 1 6 0 7 0 8 1 9 1 10 0 11 0 12 0 13 0 14 1 15 1 16 1 17 1 18 1 19 0 20 0
first_column is a binary column of 0s and 1s. There are “clusters” of consecutive ones that are always in pairs of at least two.
My goal is to create a column that "counts" the number of rows for each group:
>>> df first_column counts 0 0 0 1 0 0 2 0 0 3 1 3 4 1 3 5 1 3 6 0 0 7 0 0 8 1 2 9 1 2 10 0 0 11 0 0 12 0 0 13 0 0 14 1 5 15 1 5 16 1 5 17 1 5 18 1 5 19 0 0 20 0 0
This sounds like a job for df.loc() , for example. df.loc[df.first_column == 1] ... something
I'm just not sure how to take into account each individual “cluster” of them and how to flag each of the unique clusters with a “row count”.
How to do it?