Python pandas dataframe sort_values ​​not working

I have the following pandas data frame that I want to sort by 'test_type'

test_type tps mtt mem cpu 90th 0 sso_1000 205.263559 4139.031090 24.175933 34.817701 4897.4766 1 sso_1500 201.127133 5740.741266 24.599400 34.634209 6864.9820 2 sso_2000 203.204082 6610.437558 24.466267 34.831947 8005.9054 3 sso_500 189.566836 2431.867002 23.559557 35.787484 2869.7670 

My code for loading a data frame and sorting it - the first line of printing prints above the data frame.

  df = pd.read_csv(file) #reads from a csv file print df df = df.sort_values(by=['test_type'], ascending=True) print '\nAfter sort...' print df 

After sorting and printing the contents of the frames, the data frame still looks as follows.

Program output:

 After sort... test_type tps mtt mem cpu 90th 0 sso_1000 205.263559 4139.031090 24.175933 34.817701 4897.4766 1 sso_1500 201.127133 5740.741266 24.599400 34.634209 6864.9820 2 sso_2000 203.204082 6610.437558 24.466267 34.831947 8005.9054 3 sso_500 189.566836 2431.867002 23.559557 35.787484 2869.7670 

I expect line 3 (test type: line sso_500) to be on top after sorting. Can someone help me understand why it is not working as it should?

0
source share
2 answers

Presumably, what you are trying to do is sorted by a numerical value after sso_ . You can do it as follows:

 import numpy as np df.ix[np.argsort(df.test_type.str.split('_').str[-1].astype(int).values) 

it

  • splits strings into _

  • converts the value after this character to a numeric value

  • Finds indexes sorted by numeric values

  • Modifies a DataFrame according to these indices

Example

 In [15]: df = pd.DataFrame({'test_type': ['sso_1000', 'sso_500']}) In [16]: df.sort_values(by=['test_type'], ascending=True) Out[16]: test_type 0 sso_1000 1 sso_500 In [17]: df.ix[np.argsort(df.test_type.str.split('_').str[-1].astype(int).values)] Out[17]: test_type 1 sso_500 0 sso_1000 
+5
source

Alternatively, you can also extract numbers from test_type and sort them. Reindexing of DF follows these indices.

 df.reindex(df['test_type'].str.extract('(\d+)', expand=False) \ .astype(int).sort_values().index).reset_index(drop=True) 

Picture

+3
source

Source: https://habr.com/ru/post/1265068/


All Articles