Pandas middle name column with highest value

Question

Pandas middle name column with highest value

I am trying to find the column name associated with the largest and second largest value in the DataFrame, here is a simplified example (the real one has more than 500 columns):

Date val1 val2 val3 val4 1990 5 7 1 10 1991 2 1 10 3 1992 10 9 6 1 1993 50 10 2 15 1994 1 15 7 8

Need to become:

 Date 1larg 2larg 1990 val4 val2 1991 val3 val4 1992 val1 val2 1993 val1 val4 1994 val2 val4

I can find the column name with the largest value (i, e, 1larg above) with idxmax, but how can I find the second largest?

+5

pandas dataframe

AtotheSiv 24 sept '14 at 11:13

source share

1 answer

DSM · Accepted Answer · 2014-09-24T12:11:35+0000

(You do not have duplicates of the maximum values in your lines, so I assume that if you have [1,1,2,2] you want to select val3 and val4 .)

One way is to use the argsort result as an index in the Series with column names.

 df = df.set_index("Date") arank = df.apply(np.argsort, axis=1) ranked_cols = df.columns.to_series()[arank.values[:,::-1][:,:2]] new_frame = pd.DataFrame(ranked_cols, index=df.index)

produces

  0 1 Date 1990 val4 val2 1991 val3 val4 1992 val1 val2 1993 val1 val4 1994 val2 val4 1995 val4 val3

(where I added the extra line 1995 [1,1,2,2] .)

Alternatively, you could probably melt in a flat format, select the largest two values in each date group, and then turn it back on.

Pandas middle name column with highest value

More articles: