Sort by one column value, save rows grouped by another column value

I have two (hundreds) dfs that are generated and then merged, which I would like to sort by keeping the rows with the same column names Din the original order:

In [120]: df_list[0]
Out[120]:

   A         B         C         D
0  0.564678  0.598355  0.606693  MA0835
1  0.066291  0.063587  0.662292  MA0835
2  0.000000  0.000000  0.010758  MA0835
3  0.000000  0.000000  0.097895  MA0835
4  0.000000  0.000000  0.136468  MA0835

In [121]: df_list[1]
Out[121]:

   A         B         C         D
0  0.628844  0.614492  0.570333  MA1002
1  0.317790  0.293189  0.239368  MA1002
2  0.000000  0.000000  0.000000  MA1002
3  0.000000  0.000000  0.000000  MA1002
4  0.000000  0.000000  0.000000  MA1002

In [122]: df = pd.concat(df_list[0:2])

In [122]: df
Out[122]:
   A         B         C         D
0  0.564678  0.598355  0.606693  MA0835
1  0.066291  0.063587  0.662292  MA0835
2  0.000000  0.000000  0.010758  MA0835
3  0.000000  0.000000  0.097895  MA0835
4  0.000000  0.000000  0.136468  MA0835
0  0.628844  0.614492  0.570333  MA1002
1  0.317790  0.293189  0.239368  MA1002
2  0.000000  0.000000  0.000000  MA1002
3  0.000000  0.000000  0.000000  MA1002
4  0.000000  0.000000  0.000000  MA1002

Standard sorting produces:

In [125]: df.sort_values('A',ascending=False)
Out[125]:
   A         B         C         D
0  0.628844  0.614492  0.570333  MA1002
0  0.564678  0.598355  0.606693  MA0835
1  0.317790  0.293189  0.239368  MA1002
1  0.066291  0.063587  0.662292  MA0835
2  0.000000  0.000000  0.010758  MA0835
3  0.000000  0.000000  0.097895  MA0835
4  0.000000  0.000000  0.136468  MA0835
2  0.000000  0.000000  0.000000  MA1002
3  0.000000  0.000000  0.000000  MA1002
4  0.000000  0.000000  0.000000  MA1002

However, I would like to sort by Aand save the grouping of strings as indicated in D. This is the desired result:

   A         B         C         D   
0  0.628844  0.614492  0.570333  MA1002
1  0.317790  0.293189  0.239368  MA1002
2  0.000000  0.000000  0.000000  MA1002
3  0.000000  0.000000  0.000000  MA1002
4  0.000000  0.000000  0.000000  MA1002
0  0.564678  0.598355  0.606693  MA0835
1  0.066291  0.063587  0.662292  MA0835
2  0.000000  0.000000  0.010758  MA0835
3  0.000000  0.000000  0.097895  MA0835
4  0.000000  0.000000  0.136468  MA0835

Do I need to work with groupby, or is there another sorting / grouping method that I am not familiar with?

+4
source share
1 answer

Use the argument keysinpd.concat

keys = [(df.A.iloc[0], i) for i, df in enumerate(list_of_dfs)]
pd.concat(list_of_dfs, keys=keys) \
    .sort_index(ascending=[False, True, True]) \
    .reset_index(drop=True)

enter image description here

+2
source

Source: https://habr.com/ru/post/1649957/


All Articles