Pandas sort lambda function

Given a data frame "a" with 3 columns, A, B, C and 3 rows of numerical values. How to sort all lines using the comp operator, using only the product A [i] * B [i]. It seems that sorting pandas only takes up the columns, and then the sorting method.
I would like to use the comparison function as shown below.

f = lambda i,j: a['A'][i]*a['B'][i] < a['A'][j]*a['B'][j]
+10
source share
3 answers

There are at least two ways:

Method 1

Suppose you start with

In [175]: df = pd.DataFrame({'A': [1, 2], 'B': [1, -1], 'C': [1, 1]})

You can add a column which is your sort key

In [176]: df['sort_val'] = df.A * df.B

Finally, sort by it and lower it

In [190]: df.sort_values('sort_val').drop('sort_val', 1)
Out[190]: 
   A  B  C
1  2 -1  1
0  1  1  1

Method 2

numpy.argsort, .ix :

In [197]: import numpy as np

In [198]: df.ix[np.argsort(df.A * df.B).values]
Out[198]: 
   A  B  C
0  1  1  1
1  2 -1  1
+15

, , Google:

df.loc[(df.A * df.B).sort_values().index]

. @Ami Tavory ; , - , .

+2

@srs iloc loc , iloc , ( loc)

import numpy as np
import pandas as pd

N = 10000
df = pd.DataFrame({
                   'A': np.random.randint(low=1, high=N, size=N), 
                   'B': np.random.randint(low=1, high=N, size=N)
                  })

%%timeit -n 100
df['C'] = df['A'] * df['B']
df.sort_values(by='C')

: 100 , 3: 1,85

%%timeit -n 100
df.loc[(df.A * df.B).sort_values().index]

loc: 100 , 3: 2,69

%%timeit -n 100
df.iloc[(df.A * df.B).sort_values().index]

iloc: 100 , 3: 2,02

df['C'] = df['A'] * df['B']

df1 = df.sort_values(by='C')
df2 = df.loc[(df.A * df.B).sort_values().index]
df3 = df.iloc[(df.A * df.B).sort_values().index]

print np.array_equal(df1.index, df2.index)
print np.array_equal(df2.index, df3.index)

test results (comparing the entire order of indices) between all parameters:

truth

truth

0
source

Source: https://habr.com/ru/post/1654773/