Here is a simple function that you can use directly by passing data frame and threshold
df ''' pets location owner id 0 cat San_Diego Champ 123.0 1 dog NaN Ron NaN 2 cat NaN Brick NaN 3 monkey NaN Champ NaN 4 monkey NaN Veronica NaN 5 dog NaN John NaN '''
def rmissingvaluecol(dff,threshold): l = [] l = list(dff.drop(dff.loc[:,list((100*(dff.isnull().sum()/len(dff.index))>=threshold))].columns, 1).columns.values) print("# Columns having more than %s percent missing values:"%threshold,(dff.shape[1] - len(l))) print("Columns:\n",list(set(list((dff.columns.values))) - set(l))) return l rmissingvaluecol(df,1)
Now create a new data frame excluding these columns
l = rmissingvaluecol(df,1) df1 = df[l]
PS: you can change the threshold according to your requirement
Bonus step
You can find the percentage of missing values ββfor each column (optional)
def missing(dff): print (round((dff.isnull().sum() * 100/ len(dff)),2).sort_values(ascending=False)) missing(df)
Suhas_Pote Jun 19 '19 at 15:15 2019-06-19 15:15
source share