I have a pandas dataframe as below
>>> df.head()
0 1 2 3 4 5 6
0 35000 26009 OPTIDX BANKNIFTY XX 1499351400 BANKNIFTY1770621000CE
1 35001 26009 OPTIDX BANKNIFTY XX 1499351400 BANKNIFTY1770621000PE
2 35002 26000 OPTIDX NIFTY XX 1609425000 NIFTY20DEC10400CE
3 35003 26000 OPTIDX NIFTY XX 1609425000 NIFTY20DEC10400PE
4 35004 26009 OPTIDX BANKNIFTY XX 1499956200 BANKNIFTY1771321100CE
I want to group them by column 5 in sorted order and return the first n groups, where n can be specified as a variable.
I did df.sort_values(5).groupby([5]), I got<pandas.core.groupby.DataFrameGroupBy object at 0x2afc8d0>
How to get all the rows in the first two groups. In the sample df above group 1 will be 1499351400, group 2 will be 1499351400, group 3 will be 1609425000
Expected Result: when groups are required = 2
0 1 2 3 4 5 6
0 35000 26009 OPTIDX BANKNIFTY XX 1499351400 BANKNIFTY1770621000CE
1 35001 26009 OPTIDX BANKNIFTY XX 1499351400 BANKNIFTY1770621000PE
4 35004 26009 OPTIDX BANKNIFTY XX 1499956200 BANKNIFTY1771321100CE
Update1: After @ jezrael's attempt
>>> k2=k1[k1.groupby(5).ngroup() < 2]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/python/2.7/lib/python2.7/site-packages/pandas/core/groupby.py", line 529, in __getattr__
(type(self).__name__, attr))
AttributeError: 'DataFrameGroupBy' object has no attribute 'ngroup'
Additionally: is it possible to do this without pandas (python only), I can not always find machines with pandas on them. Thanks
source
share