I am using the Kaggle Titanic dataset and trying to fill in null values. Doing this:
combined_df.isnull().sum()
Get this:
Age 263
Embarked 2
Fare 1
Parch 0
PassengerId 0
Pclass 0
Sex 0
SibSp 0
Survived 418
fam_size 0
Title 0
dtype: int64
So, to fill in the null values, I do the following:
combined_df.Age.fillna(combined_df.Age.mean(), inplace=True)
combined_df.Embarked.fillna(combined_df.Embarked.mode(), inplace=True)
combined_df.Fare.fillna(combined_df.Fare.mean(), inplace=True)
So, when I run this now:
combined_df.isnull().sum()
I get:
Age 0
Embarked 2
Fare 0
Parch 0
PassengerId 0
Pclass 0
Sex 0
SibSp 0
Survived 418
fam_size 0
Title 0
dtype: int64
That way, it handles the columns correctly Age
and Fare
, but Embarked
still has two null values, as before.
I wonder when I run:
combined_df.Embarked.value_counts()
I'm coming back:
S 914
C 270
Q 123
Name: Embarked, dtype: int64
So it looks like Embarked
there are no null values?
Very vaguely; any suggestions?
Thank!
source
share