Python pandas - remove group based on collective NaN

Question

Python pandas - remove group based on collective NaN

I have a dataset based on different weather stations for several variables (temperature, pressure, etc.),

stationID | Time | Temperature | Pressure |...
----------+------+-------------+----------+
123       |  1   |     30      |  1010.5  |
123       |  2   |     31      |  1009.0  |
202       |  1   |     24      |  NaN     |
202       |  2   |     24.3    |  NaN     |
202       |  3   |     NaN     |  1000.3  |
...

And I would like to remove the "stationID" groups that have more than a certain amount of NaN (taking into account all the variables in the account).

If I try

df.loc[df.groupby('station')['temperature'].filter(lambda x: len(x[pd.isnull(x)] ) < 30).index]

it works as shown here: Python pandas - remove groups based on NaN threshold

But in the above example, only "temperature" is taken into account. So, how can I take into account the collective sum of NaN available variables? , i.e. I would like to remove a group where the collective sum of NaNs in [variable1, variable2, variable3, ...] is less than the threshold.

+4

python pandas

mmeclimate 25 . '16 18:20

1

Psidom · Accepted Answer · 2016-07-25T18:27:09+0000

:

df.groupby('stationID').filter(lambda g: g.isnull().sum().sum() < 4)

4 , .

df.groupby('stationID').filter(lambda g: g.isnull().sum().sum() < 4)

   stationID    Time    Temperature Pressure
0        123       1           30.0   1010.5
1        123       2           31.0   1009.0
2        202       1           24.0      NaN
3        202       2           24.3      NaN
4        202       3            NaN   1000.3


df.groupby('stationID').filter(lambda g: g.isnull().sum().sum() < 3)

   stationID    Time    Temperature Pressure
0        123       1           30.0   1010.5
1        123       2           31.0   1009.0

Python pandas - remove group based on collective NaN

More articles: