Display missing values for a specific column based on another specific column

Question

Display missing values for a specific column based on another specific column

That's my problem

Let's say I have 2 columns on a dataframe that look like this:

 Type   | Killed
_______ |________
 Dog        1
 Dog       nan
 Dog       nan
 Cat        4
 Cat       nan
 Cow        1
 Cow       nan

I would like to display all the missing value in Killed according to type and count them

The result of my desire will look something like this:

Type | Sum(isnull)
Dog       2
Cat       1
Cow       1

Is there any way to show this?

+4

python pandas nan dataframe multiple-columns

Niche.p Sep 01 '16 at 5:45

source share

2 answers

I can get you both isnull, and sonotnull

isnull = np.where(df.Killed.isnull(), 'isnull', 'notnull')
df.groupby([df.Type, isnull]).size().unstack()

+1

piRSquared Sep 01 '16 at 6:11

source share

jezrael · Accepted Answer · 2016-09-01T05:48:47+0000

You can use boolean indexingwith value_counts:

print (df.ix[df.Killed.isnull(), 'Type'].value_counts().reset_index(name='Sum(isnull)'))

  index  Sum(isnull)
0   Dog            2
1   Cow            1
2   Cat            1

Or an aggregate size, it looks faster:

print (df[df.Killed.isnull()]
            .groupby('Type')['Killed']
            .size()
            .reset_index(name='Sum(isnull)'))

  Type  Sum(isnull)
0  Cat           1
1  Cow           1
2  Dog           2

Delay

df = pd.concat([df]*1000).reset_index(drop=True)

In [30]: %timeit (df.ix[df.Killed.isnull(), 'Type'].value_counts().reset_index(name='Sum(isnull)'))
100 loops, best of 3: 5.36 ms per loop

In [31]: %timeit (df[df.Killed.isnull()].groupby('Type')['Killed'].size().reset_index(name='Sum(isnull)'))
100 loops, best of 3: 2.02 ms per loop

Display missing values ​​for a specific column based on another specific column

More articles:

Display missing values for a specific column based on another specific column