Python pandas - remove groups based on NaN threshold

I have a dataset based on different weather stations,

stationID | Time | Temperature | ...
----------+------+-------------+-------
123       |  1   |     30      |
123       |  2   |     31      |
202       |  1   |     24      |
202       |  2   |     24.3    |
202       |  3   |     NaN     |
...

And I would like to remove the "stationID" groups that have more than a certain amount of NaN. For example, if I type:

**>>> df.groupby('stationID')**

then I would like to remove groups that (at least) have a certain amount of NaN (say 30) inside the group. As far as I understand, I can not use dropna (thresh = 10) with groupby:

**>>> df2.groupby('station').dropna(thresh=30)**
*AttributeError: Cannot access callable attribute 'dropna' of 'DataFrameGroupBy' objects...*

So what would be the best way to do this with Pandas?

+1
source share
3 answers

IIUC you can do df2.loc[df2.groupby('station')['Temperature'].filter(lambda x: len(x[pd.isnull(x)] ) < 30).index]

Example:

In [59]:
df = pd.DataFrame({'id':[0,0,0,1,1,1,2,2,2,2], 'val':[1,1,np.nan,1,np.nan,np.nan, 1,1,1,1]})
df

Out[59]:
   id  val
0   0  1.0
1   0  1.0
2   0  NaN
3   1  1.0
4   1  NaN
5   1  NaN
6   2  1.0
7   2  1.0
8   2  1.0
9   2  1.0

In [64]:    
df.loc[df.groupby('id')['val'].filter(lambda x: len(x[pd.isnull(x)] ) < 2).index]

Out[64]:
   id  val
0   0  1.0
1   0  1.0
2   0  NaN
6   2  1.0
7   2  1.0
8   2  1.0
9   2  1.0

Thus, it will filter out groups with more than 1 nn values.

+2
source

, station_id, loc .

df['station_id_null_count'] = \
    df.groupby('stationID').Temperature.transform(lambda group: group.isnull().sum())
df.loc[df.station_id_null_count > 30, :]  # Select relevant data
0

Using the @EdChum setting: since you will not indicate your final output by adding this.

   vals = df.groupby(['id'])['val'].apply(lambda x: (np.size(x)-x.count()) < 2 ) 

   vals[vals]

   id
   0    True
   2    True
   Name: val, dtype: bool
0
source

Source: https://habr.com/ru/post/1649067/


All Articles