Pandas: Difficulty filling with null values

I am using the Kaggle Titanic dataset and trying to fill in null values. Doing this:

combined_df.isnull().sum()

Get this:

Age            263
Embarked         2
Fare             1
Parch            0
PassengerId      0
Pclass           0
Sex              0
SibSp            0
Survived       418
fam_size         0
Title            0
dtype: int64

So, to fill in the null values, I do the following:

combined_df.Age.fillna(combined_df.Age.mean(), inplace=True)
combined_df.Embarked.fillna(combined_df.Embarked.mode(), inplace=True)
combined_df.Fare.fillna(combined_df.Fare.mean(), inplace=True)

So, when I run this now:

combined_df.isnull().sum()

I get:

Age              0
Embarked         2
Fare             0
Parch            0
PassengerId      0
Pclass           0
Sex              0
SibSp            0
Survived       418
fam_size         0
Title            0
dtype: int64

That way, it handles the columns correctly Ageand Fare, but Embarkedstill has two null values, as before.

I wonder when I run:

combined_df.Embarked.value_counts()

I'm coming back:

S    914
C    270
Q    123
Name: Embarked, dtype: int64

So it looks like Embarkedthere are no null values?

Very vaguely; any suggestions?

Thank!

+4
source share
2 answers

, mode , (, , , ). (, ).

df = pd.DataFrame({'Emb': ['S', 'Q', 'C',  np.nan, 'Q', None]})
df
    Emb
0     S
1     Q
2     C
3   NaN
4     Q
5  None
df.fillna(df.Emb.mode())
    Emb
0     S
1     Q
2     C
3   NaN
4     Q
5  None
df.fillna(df.Emb.mode()[0])
  Emb
0   S
1   Q
2   C
3   Q
4   Q
5   Q

:

mode = df.Emb.mode()
mode
0    Q
dtype: object
0      S
1      Q
2      C
3    NaN
4      Q
5    NaN
Name: Emb, dtype: object
mode.index = [5]
5    Q
dtype: object
df.Emb.fillna(mode)
0      S
1      Q
2      C
3    NaN
4      Q
5      Q
Name: Emb, dtype: object
+2

dropna=False value_counts

combined_df.Embarked.value_counts(dropna=False)
+2

Source: https://habr.com/ru/post/1692779/


All Articles