Situation
I have a pandas framework that is defined as follows:
import pandas as pd
headers = ['Group', 'Element', 'Case', 'Score', 'Evaluation']
data = [
['A', 1, 'x', 1.40, 0.59],
['A', 1, 'y', 9.19, 0.52],
['A', 2, 'x', 8.82, 0.80],
['A', 2, 'y', 7.18, 0.41],
['B', 1, 'x', 1.38, 0.22],
['B', 1, 'y', 7.14, 0.10],
['B', 2, 'x', 9.12, 0.28],
['B', 2, 'y', 4.11, 0.97],
]
df = pd.DataFrame(data, columns=headers)
which is as follows:
Group Element Case Score Evaluation
0 A 1 x 1.40 0.59
1 A 1 y 9.19 0.52
2 A 2 x 8.82 0.80
3 A 2 y 7.18 0.41
4 B 1 x 1.38 0.22
5 B 1 y 7.14 0.10
6 B 2 x 9.12 0.28
7 B 2 y 4.11 0.97
Problem
I would like to perform a grouping and aggregation operation on df, which will give me the following resulting data file:
Group Max_score_value Max_score_element Max_score_case Min_evaluation
0 A 9.19 1 y 0.41
1 B 9.12 2 x 0.10
To clarify in more detail: I would like to group a column Group, and then apply aggregation to get the following columns of results:
Max_score_value: The maximum value of the group from the column Score.Max_score_element: The value from the column Elementthat corresponds to the maximum value Score.Max_score_case: The value from the column Casethat corresponds to the maximum value Score.Min_evaluation: minimum value of a group from a column Evaluation.
Tried so far
I came up with the following code for grouping and aggregation:
result = (
df.set_index(['Element', 'Case'])
.groupby('Group')
.agg({'Score': ['max', 'idxmax'], 'Evaluation': 'min'})
.reset_index()
)
print(result)
which gives as output:
Group Score Evaluation
max idxmax min
0 A 9.19 (1, y) 0.41
1 B 9.12 (2, x) 0.10
, , , . , . - , ?