How to replace all non-numeric entries with NaN in pandas data frame?

I have various csv files and I import them as a DataFrame. The problem is that many files use different characters for missing values. Some use nan, others NaN, ND, None, absent, etc. Or just live empty. Is there a way to replace all of these values ​​with np.nan? In other words, any non-numeric value in the dataframe becomes np.nan. Thanks for the help.

+6
source share
1 answer

I found what I consider a relatively elegant, but also reliable method:

def isnumber(x): try: float(x) return True except: return False df[df.applymap(isnumber)] 

In case this is not clear: you define a function that returns True only if any input you enter can be converted to float. Then you filter df with this boolean frame, which automatically assigns NaN cells that you did not filter.

Another solution I tried was to define isnumber as

 import number def isnumber(x): return isinstance(x, number.Number) 

but what I liked less about this approach is that you can randomly assign a number as a string, so you mistakenly filter them out. This is also an opaque error, because on the data screen the line "99" displayed in the same way as number 99 .

EDIT:

In your case, you probably still need df = df.applymap(float) after filtering, because float works in all different 'nan' header files, but until you explicitly convert them, they will still be considered rows in the data frame.

+4
source

Source: https://habr.com/ru/post/1014555/


All Articles