Sort Pandas Categorical tags after groupby

Question

Sort Pandas Categorical tags after groupby

I use pd.cutto discretize a data set. Everything works great. However, I have a question with the type of the object Categorical, which is the data type returned pd.cut. The docs say the object is Categoricaltreated as an array of strings, so I'm not surprised to see that these labels are lexically sorted when grouped.

For example, the following code:

df = pd.DataFrame({'value': np.random.randint(0, 10000, 100)})

labels = []
for i in range(0, 10000, 500):
    labels.append("{0} - {1}".format(i, i + 499))

df.sort(columns=['value'], inplace=True, ascending=True)
df['value_group'] = pd.cut(df.value, range(0, 10500, 500), right=False, labels=labels)

df.groupby(['value_group'])['value_group'].count().plot(kind='bar')

It displays the following diagram:

enter image description here

(notice 500-599 in the middle)

Before grouping, the structure is in the following order:

In [94]: df['value_group']
Out [94]: 
59        0 - 499
58        0 - 499
0       500 - 999
94      500 - 999
76      500 - 999
95     1000 - 1499
17     1000 - 1499
48     1000 - 1499

, , , - - char, . ['A) 0 - 499', 'B) 500-999', ... ], . , , - , (, ). ?

+4

python pandas data-analysis

Bill 22 '14 18:15

3

. :

group = df.groupby(['value_group'])['value_group'].count()
sortd= group.reindex_axis(sorted(group.index, key=lambda x: int(x.split("-")[0])))

, sortd, .

+2

grasshopper 22 '14 18:46

, , sorted=False, :

df.groupby(['value_group'], sorted=False)['value_group'].count().plot(kind='bar')

0

A.Kot Oct 18 '17 at 16:05

source share

DSM · Accepted Answer · 2014-05-22T18:47:43+0000

. , , , , :

In [104]: z = df.groupby('value_group').size()

In [105]: z[sorted(z.index, key=lambda x: float(x.split()[0]))]
Out[105]: 
0 - 499        5
500 - 999      6
1000 - 1499    4
1500 - 1999    6
2000 - 2499    4
2500 - 2999    6
3000 - 3499    3
3500 - 3999    3
4000 - 4499    2
4500 - 4999    6
5000 - 5499    6
5500 - 5999    5
6000 - 6499    6
6500 - 6999    2
7000 - 7499    9
7500 - 7999    3
8000 - 8499    7
8500 - 8999    6
9000 - 9499    5
9500 - 9999    6
dtype: int64

In [106]: z[sorted(z.index, key=lambda x: float(x.split()[0]))].plot(kind='bar')
Out[106]: <matplotlib.axes.AxesSubplot at 0xbe87d30>

Sort Pandas Categorical tags after groupby

More articles: