You can group the VID column and then take the amount of each group. Then after that use it to index your original df to only take rows with a number greater than 3. Example -
countdf = df.groupby('VID').count() result = df.loc[df['VID'].isin(countdf[countdf['value'] > 3].index)]
Demo -
In [49]: df Out[49]: VID value 1 1 xx 2 2 xx1 3 2 xx2 4 2 xx3 5 2 xx4 6 3 xx 7 3 xx 8 3 xx 9 4 zz1 10 4 zz2 11 4 zz3 12 4 zz4 13 4 zz5 In [51]: df.groupby('VID').count() Out[51]: value VID 1 1 2 4 3 3 4 5 In [52]: countdf = df.groupby('VID').count() In [53]: df.loc[df['VID'].isin(countdf[countdf['value'] > 3].index)] Out[53]: VID value 2 2 xx1 3 2 xx2 4 2 xx3 5 2 xx4 9 4 zz1 10 4 zz2 11 4 zz3 12 4 zz4 13 4 zz5
Then after that you can group again based on VID , and then convert the groups to list , and then again to list, example -
resultlist = result.groupby('VID')['value'].apply(list).tolist()
Demo -
In [54]: result = df.loc[df['VID'].isin(countdf[countdf['value'] > 3].index)] In [55]: result.groupby('VID')['value'].apply(list).tolist() Out[55]: [['xx1', 'xx2', 'xx3', 'xx4'], ['zz1', 'zz2', 'zz3', 'zz4', 'zz5']]
Please note: above you would not have the 'end' value in the list, I assume that this is not necessary, but if you really want it, you can manually add it after receiving the list. Example -
resultlist = [elem + ['end'] for elem in resultlist]
source share