Discard rows with a question mark icon in any column in the pandas data frame

I want to delete all rows (or take all rows without) a question mark character in any column. I also want to change the elements to float .

Input:

XYZ 0 1 ? 1 2 3 ? ? 4 4 4 4 ? 2 5 

Output:

 XYZ 1 2 3 4 4 4 

It is preferable to use pandas data operations.

+6
source share
2 answers

Can you try to find the string first ? in columns, create a logical mask and the last lines of the filter - use logical indexing . If you need to convert columns to float , use astype :

 print ~((df['X'] == '?' ) (df['Y'] == '?' ) | (df['Z'] == '?' )) 0 False 1 True 2 False 3 True 4 False dtype: bool df1 = df[~((df['X'] == '?' ) | (df['Y'] == '?' ) | (df['Z'] == '?' ))].astype(float) print df1 XYZ 1 1 2 3 3 4 4 4 print df1.dtypes X float64 Y float64 Z float64 dtype: object 

Or you can try:

 df['X'] = pd.to_numeric(df['X'], errors='coerce') df['Y'] = pd.to_numeric(df['Y'], errors='coerce') df['Z'] = pd.to_numeric(df['Z'], errors='coerce') print df XYZ 0 0 1 NaN 1 1 2 3 2 NaN NaN 4 3 4 4 4 4 NaN 2 5 print ((df['X'].notnull() ) & (df['Y'].notnull() ) & (df['Z'].notnull() )) 0 False 1 True 2 False 3 True 4 False dtype: bool print df[ ((df['X'].notnull() ) & (df['Y'].notnull() ) & (df['Z'].notnull() )) ].astype(float) XYZ 1 1 2 3 3 4 4 4 

Better to use:

 df = df[(df != '?').all(axis=1)] 

Or:

 df = df[~(df == '?').any(axis=1)] 
+6
source

Can you try a replacement ? zero values

 import numpy as np data = df.replace("?", "np.Nan") 

if you want to replace a specific column try this:

 data = df["column name"].replace("?", "np.Nan") 
+1
source

Source: https://habr.com/ru/post/1244045/


All Articles