Difference between comparison with np.nan and isnull ()

I suggested that

data[data.agefm.isnull()]

and

data[data.agefm == numpy.nan]

are equivalent. But no, the first does return rows where agefm is NaN, but the second does return an empty DataFrame. I thank that the omitted values ​​are always equal np.nan, but this seems wrong.

The agefm column has type float64:

(Pdb) data.agefm.describe()
count    2079.000000
mean       20.686388
std         5.002383
min        10.000000
25%        17.000000
50%        20.000000
75%        23.000000
max        46.000000
Name: agefm, dtype: float64

Could you please explain to me what data[data.agefm == np.nan]exactly means ?

+4
source share
1 answer

np.nannot comparable to np.nan... directly.

np.nan == np.nan

False

While

np.isnan(np.nan)

True

Can also do

pd.isnull(np.nan)

True

examples
Filters nothing because nothing is equalnp.nan

s = pd.Series([1., np.nan, 2.])
s[s != np.nan]

0    1.0
1    NaN
2    2.0
dtype: float64

Sets the value to zero

s = pd.Series([1., np.nan, 2.])
s[s.notnull()]

0    1.0
2    2.0
dtype: float64

, , . np.nan != np.nan - True,

s = pd.Series([1., np.nan, 2.])
s[s == s]

0    1.0
2    2.0
dtype: float64

dropna

s = pd.Series([1., np.nan, 2.])
s.dropna()

0    1.0
2    2.0
dtype: float64
+6

Source: https://habr.com/ru/post/1664981/


All Articles