Pandas middle name column with highest value

I am trying to find the column name associated with the largest and second largest value in the DataFrame, here is a simplified example (the real one has more than 500 columns):

Date val1 val2 val3 val4 1990 5 7 1 10 1991 2 1 10 3 1992 10 9 6 1 1993 50 10 2 15 1994 1 15 7 8 

Need to become:

 Date 1larg 2larg 1990 val4 val2 1991 val3 val4 1992 val1 val2 1993 val1 val4 1994 val2 val4 

I can find the column name with the largest value (i, e, 1larg above) with idxmax, but how can I find the second largest?

+5
source share
1 answer

(You do not have duplicates of the maximum values ​​in your lines, so I assume that if you have [1,1,2,2] you want to select val3 and val4 .)

One way is to use the argsort result as an index in the Series with column names.

 df = df.set_index("Date") arank = df.apply(np.argsort, axis=1) ranked_cols = df.columns.to_series()[arank.values[:,::-1][:,:2]] new_frame = pd.DataFrame(ranked_cols, index=df.index) 

produces

  0 1 Date 1990 val4 val2 1991 val3 val4 1992 val1 val2 1993 val1 val4 1994 val2 val4 1995 val4 val3 

(where I added the extra line 1995 [1,1,2,2] .)

Alternatively, you could probably melt in a flat format, select the largest two values ​​in each date group, and then turn it back on.

+6
source

Source: https://habr.com/ru/post/1203294/


All Articles