I have the following framework:
arrays = [np.array(['1', '1', '1', '2', '2', '2', '3', '3', '3', '4', '4', '4']),
np.array(['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'])]
df = pd.DataFrame(np.random.randn(12, 3), index=arrays, columns=['Column1', 'Column2', 'Column3'])
df.index.names = ['Index1', 'Index2']
What looks like this:
Column1 Column2 Column3
Index1 Index2
1 A -0.218251 1.744845 -0.241300
B 1.107614 -0.059469 0.952544
C 0.203066 0.412727 0.057129
2 A 0.432153 0.568879 -1.014900
B -0.713515 -0.790029 1.530333
C 0.547787 -0.161020 0.078548
3 A 0.425833 -0.316999 -0.516260
B 0.980780 0.844847 1.097464
C -1.724548 0.199910 0.961234
4 A 0.130533 -1.249353 -0.848859
B -0.674836 1.404397 1.258285
C 0.741651 1.578671 -1.411311
What I want to do is split / apply / merge and return a data file that looks like this:
Column1 Column2 Column3
Index1 Index2
1 B 1.107614 -0.059469 0.952544
C 0.203066 0.412727 0.057129
2 B -0.713515 -0.790029 1.530333
C 0.547787 -0.161020 0.078548
3 A 0.425833 -0.316999 -0.516260
B 0.980780 0.844847 1.097464
4 A 0.130533 -1.249353 -0.848859
B -0.674836 1.404397 1.258285
What he does here is take the two largest of A / B / C based on column 1 at time 1 (in this case, B and C). He stores only these two for a time of 1 and 2.
Then, at time 3, he again takes the two largest of A / B / C based on column 1 (this time A and B), and then saves these two for times 3 and 4.
Is there any way to use groupby, nlargest and any other functions to accomplish this? Should a custom function be executed?