Python pandas nan filtering from row column data selection

Without using groupby how would I filter data without NaN ?

Suppose I have a matrix in which clients will fill in 'N / A', 'N / A' or any of its variants, while others leave this field empty:

 import pandas as pd import numpy as np df = pd.DataFrame({'movie': ['thg', 'thg', 'mol', 'mol', 'lob', 'lob'], 'rating': [3., 4., 5., np.nan, np.nan, np.nan], 'name': ['John', np.nan, 'N/A', 'Graham', np.nan, np.nan]}) nbs = df['name'].str.extract('^(N/A|NA|na|n/a)') nms=df[(df['name'] != nbs) ] 

exit:

 >>> nms movie name rating 0 thg John 3 1 thg NaN 4 3 mol Graham NaN 4 lob NaN NaN 5 lob NaN NaN 

How would I filter NaN values โ€‹โ€‹to get the results to work as follows:

  movie name rating 0 thg John 3 3 mol Graham NaN 

I assume I need something like ~np.isnan but the tilde does not work with strings.

+133
python pandas dataframe
Mar 21 '14 at 6:04
source share
4 answers

Just drop them:

 nms.dropna(thresh=2) 

this will drop all lines where there are at least two non- NaN .

Then you can leave where the name NaN :

 In [87]: nms Out[87]: movie name rating 0 thg John 3 1 thg NaN 4 3 mol Graham NaN 4 lob NaN NaN 5 lob NaN NaN [5 rows x 3 columns] In [89]: nms = nms.dropna(thresh=2) In [90]: nms[nms.name.notnull()] Out[90]: movie name rating 0 thg John 3 3 mol Graham NaN [2 rows x 3 columns] 

EDIT

In fact, looking at what you originally wanted, you can do this without calling dropna :

 nms[nms.name.notnull()] 

UPDATE

Looking at this question after 3 years, an error occurs, firstly, thresh arg looks for at least n non- NaN values, so in fact the result should be:

 In [4]: nms.dropna(thresh=2) Out[4]: movie name rating 0 thg John 3.0 1 thg NaN 4.0 3 mol Graham NaN 

It is possible that I either made a mistake 3 years ago, or there was a mistake in the version of the pandas I worked on, both scenarios are possible.

+177
Mar 21 '14 at 8:33
source share
โ€” -

The simplest of all solutions:

 filtered_df = df[df['name'].notnull()] 

Thus, it only filters rows that do not have NaN values โ€‹โ€‹in the "name" column.

+131
Dec 04 '17 at 8:38 on
source share
 df = pd.DataFrame({'movie': ['thg', 'thg', 'mol', 'mol', 'lob', 'lob'],'rating': [3., 4., 5., np.nan, np.nan, np.nan],'name': ['John','James', np.nan, np.nan, np.nan,np.nan]}) for col in df.columns: df = df[~pd.isnull(df[col])] 
+4
Jan 09 '19 at 11:58
source share
 df.dropna(subset=['columnName1', 'columnName2']) 
+1
Jun 06 '19 at 11:20
source share



All Articles