Find indices where df is zero

In pandas (leading branch or upcoming 0.14), how can I find indices where my dataframe is zero?

When I do this:

df.isnull()

I get a logical block of data the same size as df

If I do this:

df.isnull().index

I get the index of the original df.

What I want is the indices of these rows with NaN elements (either in some column, or in all columns)

+4
source share
2 answers

I would dump on numpy to make it a little faster:

In [11]: df = pd.DataFrame([[np.nan, 1], [0, np.nan], [1, 2]])

In [12]: df
Out[12]:
    0   1
0 NaN   1
1   0 NaN
2   1   2

In [13]: pd.isnull(df.values)
Out[13]:
array([[ True, False],
       [False,  True],
       [False, False]], dtype=bool)

In [14]: pd.isnull(df.values).any(1)
Out[14]: array([ True,  True, False], dtype=bool)

In [15]: np.nonzero(pd.isnull(df.values).any(1))
Out[15]: (array([0, 1]),)

In [16]: df.index[np.nonzero(pd.isnull(df.values).any(1))]
Out[16]: Int64Index([0, 1], dtype='int64')

To see some timings with slightly large df:

In [21]: df = pd.DataFrame([[np.nan, 1], [0, np.nan], [1, 2]] * 1000)

In [22]: %timeit np.nonzero(pd.isnull(df.values).any(1))
10000 loops, best of 3: 85.8 ยตs per loop

In [23]: %timeit df.index[df.isnull().any(1)]
1000 loops, best of 3: 629 ยตs per loop

and if you care about the index (not the position):

In [24]: %timeit df.index[np.nonzero(pd.isnull(df.values).any(1))]
10000 loops, best of 3: 172 ยตs per loop
+1
source
df.index[df.isnull().any(axis=1)]

.any(axis=1) True/False , NaN. , , df null.

+5

Source: https://habr.com/ru/post/1541829/


All Articles