Get indices of NaN values in pandas frame

Question

Get indices of NaN values in pandas frame

I am trying to get for every row containing NaN values, all the indices of the corresponding columns.

d=[[11.4,1.3,2.0, NaN],[11.4,1.3,NaN, NaN],[11.4,1.3,2.8, 0.7],[NaN,NaN,2.8, 0.7]] df = pd.DataFrame(data=d, columns=['A','B','C','D']) print df ABCD 0 11.4 1.3 2.0 NaN 1 11.4 1.3 NaN NaN 2 11.4 1.3 2.8 0.7 3 NaN NaN 2.8 0.7

I have already done the following:

add a column with a NaN score for each row
get indices of each row containing NaN values

What I want (ideally, the column name) gets a list like this:

 [ ['D'],['C','D'],['A','B'] ]

Hope I can find a way without doing a test for each column for each row

 if df.ix[i][column] == NaN:

I am looking for a pandas way to be able to handle my huge dataset.

Thanks in advance.

+5

python pandas machine-learning

dooms Nov 10 '15 at 10:55

source share

4 answers

To obtain the coordinates of zero values, it should be efficient to use a sparse matrix of scipy-coordinate format:

 import scipy.sparse as sp x,y = sp.coo_matrix(df.isnull()).nonzero() print(list(zip(x,y))) [(0, 3), (1, 2), (1, 3), (3, 0), (3, 1)]

Note that I am nonzero method to simply nonzero coordinates of the nonzero entries in the underlying sparse matrix, since I don't need the actual values, which are all True .

+3

maxymoo Nov 10 '15 at 23:12

source share

You can iterate through each row in the data frame, create a mask of zero values and display your index (i.e. columns in the data frame).

 lst = [] for _, row in df.iterrows(): mask = row.isnull() lst += [row[mask].index.tolist()] >>> lst [['D'], ['C', 'D'], [], ['A', 'B']]

+1

Alexander Nov 10 '15 at 23:14

source share

another easy way:

 >>>df.isnull().any(axis=1) 0 True 1 True 2 False 3 True dtype: bool

to a subset:

 >>> bool_idx = df.isnull().any(axis=1) >>> df[bool_idx] ABCD 0 11.4 1.3 2.0 NaN 1 11.4 1.3 NaN NaN 3 NaN NaN 2.8 0.7

to get an integer index:

 >>> df[bool_idx].index Int64Index([0, 1, 3], dtype='int64')

0

muon Dec 03 '17 at 0:32

source share

Andy hayden · Accepted Answer · 2015-11-10T23:30:10+0000

Another way: extract strings that are NaN:

 In [11]: df_null = df.isnull().unstack() In [12]: t = df_null[df_null] In [13]: t Out[13]: A 3 True B 3 True C 1 True D 0 True 1 True dtype: bool

This will get you most of the way and may be enough.
Although it might be easier to work with the series:

 In [14]: s = pd.Series(t2.index.get_level_values(1), t2.index.get_level_values(0)) In [15]: s Out[15]: 0 D 1 C 1 D 3 A 3 B dtype: object

eg. if you need lists (although I don’t think you will need them)

 In [16]: s.groupby(level=0).apply(list) Out[16]: 0 [D] 1 [C, D] 3 [A, B] dtype: object

Get indices of NaN values ​​in pandas frame

More articles:

Get indices of NaN values in pandas frame