Delete NaN / NULL columns in Pandas data frame?

I have a dataFrame in pandas and some of the columns have all null values. Is there a built-in function that will allow me to remove these columns?

+51
python pandas nan dataframe
Jun 01 2018-12-12T00:
source share
3 answers

Yes, dropna . See http://pandas.pydata.org/pandas-docs/stable/missing_data.html and DataFrame.dropna docstring:

 Definition: DataFrame.dropna(self, axis=0, how='any', thresh=None, subset=None) Docstring: Return object with labels on given axis omitted where alternately any or all of the data are missing Parameters ---------- axis : {0, 1} how : {'any', 'all'} any : if any NA values are present, drop that label all : if all values are NA, drop that label thresh : int, default None int value : require that many non-NA values subset : array-like Labels along other axis to consider, eg if you are dropping rows these would be a list of columns to include Returns ------- dropped : DataFrame 

Specific command to run:

 df=df.dropna(axis=1,how='all') 
+94
Jun 02 2018-12-12T00:
source share

Function to remove all empty columns from a data frame:

 def Remove_Null_Columns(df): dff = pd.DataFrame() for cl in fbinst: if df[cl].isnull().sum() == len(df[cl]): pass else: dff[cl] = df[cl] return dff 

This function will remove all empty columns from df.

-one
Jun 29 '18 at 6:41
source share

Here is a simple function that you can use directly by passing data frame and threshold

 df ''' pets location owner id 0 cat San_Diego Champ 123.0 1 dog NaN Ron NaN 2 cat NaN Brick NaN 3 monkey NaN Champ NaN 4 monkey NaN Veronica NaN 5 dog NaN John NaN ''' 



 def rmissingvaluecol(dff,threshold): l = [] l = list(dff.drop(dff.loc[:,list((100*(dff.isnull().sum()/len(dff.index))>=threshold))].columns, 1).columns.values) print("# Columns having more than %s percent missing values:"%threshold,(dff.shape[1] - len(l))) print("Columns:\n",list(set(list((dff.columns.values))) - set(l))) return l rmissingvaluecol(df,1) #Here threshold is 1% which means we are going to drop columns having more than 1% of missing values #output ''' # Columns having more than 1 percent missing values: 2 Columns: ['id', 'location'] ''' 

Now create a new data frame excluding these columns

 l = rmissingvaluecol(df,1) df1 = df[l] 

PS: you can change the threshold according to your requirement

Bonus step

You can find the percentage of missing values ​​for each column (optional)

 def missing(dff): print (round((dff.isnull().sum() * 100/ len(dff)),2).sort_values(ascending=False)) missing(df) #output ''' id 83.33 location 83.33 owner 0.00 pets 0.00 dtype: float64 ''' 
-one
Jun 19 '19 at 15:15
source share



All Articles