Failed to correctly sort Titanic dataset cab values

So, I have a number of Cabin values; on the left is the index, and the right column contains cab values. After using the sort_values ​​method, I was able to partially sort the values.

x =  Cabin_Fare=Cabin_Fare.sort_values(['Cabin' ]) 

210      A31
186      A32
446      A34
1185     A34
1266     A34
807      A36
97       A 
24       A6 
175      A7 
1058     B10
738     B101
816     B102
1107     B11
330      B18
524      B18
171      B19
691      B20
660      D48
682      D49
626      D50
22       D56
783      D6 
276      D7 
628      D9 
430      E10
718      E101
304      E101
124      E101
461      E12
752      E121
1234     NaN
1252     NaN
1257     NaN
73       NaN
121      NaN

The problem that I encountered, despite the fact that I can sort the letters in the cabin, I have problems sorting by numbers attached to the letters of the cabin. Therefore my desired result

97       A 
24       A6 
175      A7 
210      A31
186      A32
446      A34
1185     A34
1266     A34
807      A36
1058     B10
1107     B11
330      B18
524      B18
171      B19
691      B20
738     B101
816     B102
........

1234     NaN
1252     NaN
1257     NaN
73       NaN
121      NaN

I don't really care about NaN values, but I would like them at the end of the series. Individual cabin values, such as a single β€œA”, can be added to it β€œ0” if necessary, but I want letters without numbers to be attached to them to be the first in the list.

, , , (), . .

 x.reindex(x[x.notnull()].str[1:].replace('', 0).astype(int).sort_values().index)

.

+4
2
# setup regex for str.extract
# ?P<letter> tells pandas to make that a column with name 'letter'
regex = '(?P<letter>\D+)(?P<digit>\d*)'
# easy access to column names I'm making in extract step
cols = ['letter', 'digit']

# run extract.  will pull out letter and digit
split_df = df.Cabin.str.extract(regex, expand=True)
# make sure digit column is numeric and fill with 0
split_df['digit'] = pd.to_numeric(split_df['digit'], 'coerce').fillna(0)
# sort by cols gets us the right sort
split_df.sort_values(cols, inplace=True)
# use sorted split_df.index for a slice
df = df.ix[split_df.index]
df.head(20)

enter image description here

+3

:

letter, numbers = cabin[0], cabin[1:]
+2

Source: https://habr.com/ru/post/1658867/


All Articles