It seems to me that you need to_numeric
, because float
cannot be attributed to int
:
data_df['grade'] = pd.to_numeric(data_df['grade']).astype(int)
Another solution is first added to the float
, and then to int
:
data_df['grade'] = data_df['grade'].astype(float).astype(int)
Example:
data_df = pd.DataFrame({'grade':['10','20','17.44']}) print (data_df) grade 0 10 1 20 2 17.44 data_df['grade'] = pd.to_numeric(data_df['grade']).astype(int) print (data_df) grade 0 10 1 20 2 17
data_df['grade'] = data_df['grade'].astype(float).astype(int) print (data_df) grade 0 10 1 20 2 17
---
If some values ββcannot be converted and after to_numeric
get an error:
ValueError: cannot parse string
you can add the parameter errors='coerce'
to convert non-numeric to NaN
.
If the values ββare NaN
, then it is impossible to distinguish from int
, see docs :
data_df = pd.DataFrame({'grade':['10','20','17.44', 'aa']}) print (data_df) grade 0 10 1 20 2 17.44 3 aa data_df['grade'] = pd.to_numeric(data_df['grade'], errors='coerce') print (data_df) grade 0 10.00 1 20.00 2 17.44 3 NaN
If you want to change NaN
to some numeric ones, for example. 0
use fillna
:
data_df['grade'] = pd.to_numeric(data_df['grade'], errors='coerce') .fillna(0) .astype(int) print (data_df) grade 0 10 1 20 2 17 3 0
A little tip:
Before using errors='coerce'
, check all strings where it is not possible to list on numeric characters boolean indexing
:
print (data_df[pd.to_numeric(data_df['grade'], errors='coerce').isnull()]) grade 3 aa