Sort Pandas Categorical tags after groupby

I use pd.cutto discretize a data set. Everything works great. However, I have a question with the type of the object Categorical, which is the data type returned pd.cut. The docs say the object is Categoricaltreated as an array of strings, so I'm not surprised to see that these labels are lexically sorted when grouped.

For example, the following code:

df = pd.DataFrame({'value': np.random.randint(0, 10000, 100)})

labels = []
for i in range(0, 10000, 500):
    labels.append("{0} - {1}".format(i, i + 499))

df.sort(columns=['value'], inplace=True, ascending=True)
df['value_group'] = pd.cut(df.value, range(0, 10500, 500), right=False, labels=labels)

df.groupby(['value_group'])['value_group'].count().plot(kind='bar')

It displays the following diagram:

enter image description here

(notice 500-599 in the middle)

Before grouping, the structure is in the following order:

In [94]: df['value_group']
Out [94]: 
59        0 - 499
58        0 - 499
0       500 - 999
94      500 - 999
76      500 - 999
95     1000 - 1499
17     1000 - 1499
48     1000 - 1499

, , , - - char, . ['A) 0 - 499', 'B) 500-999', ... ], . , , - , (, ). ?

+4
3

. , , , , :

In [104]: z = df.groupby('value_group').size()

In [105]: z[sorted(z.index, key=lambda x: float(x.split()[0]))]
Out[105]: 
0 - 499        5
500 - 999      6
1000 - 1499    4
1500 - 1999    6
2000 - 2499    4
2500 - 2999    6
3000 - 3499    3
3500 - 3999    3
4000 - 4499    2
4500 - 4999    6
5000 - 5499    6
5500 - 5999    5
6000 - 6499    6
6500 - 6999    2
7000 - 7499    9
7500 - 7999    3
8000 - 8499    7
8500 - 8999    6
9000 - 9499    5
9500 - 9999    6
dtype: int64

In [106]: z[sorted(z.index, key=lambda x: float(x.split()[0]))].plot(kind='bar')
Out[106]: <matplotlib.axes.AxesSubplot at 0xbe87d30>

demo with better order

+2

enter image description here . :

group = df.groupby(['value_group'])['value_group'].count()
sortd= group.reindex_axis(sorted(group.index, key=lambda x: int(x.split("-")[0])))

, sortd, .

+2

, , sorted=False, :

df.groupby(['value_group'], sorted=False)['value_group'].count().plot(kind='bar')
0
source

Source: https://habr.com/ru/post/1541658/


All Articles