Remove NaN 'cells without dropping all ROW (Pandas, Python3)

Now I have such a DF

Word Word2 Word3 Hello NaN NaN My My Name NaN Yellow Yellow Bee Yellow Bee Hive Golden Golden Gates NaN Yellow NaN NaN 

What I was hoping for was to remove all NaN cells from my data frame. So in the end, it will look like where "Yellow Bee Hive" moved to row 1 (similar to what happens when you delete cells from a column in excel):

  Word Word2 Word3 1 Hello My Name Yellow Bee Hive 2 My Yellow Bee 3 Yellow Golden Gates 4 Golden 5 Yellow 

Unfortunately, none of them work because they delete the entire ROW!

  df = df[pd.notnull(df['Word','Word2','Word3'])] 

or

  df = df.dropna() 

Anyone have any suggestions? Should I reindex the table?

+5
source share
2 answers
 import numpy as np import pandas as pd import functools def drop_and_roll(col, na_position='last', fillvalue=np.nan): result = np.full(len(col), fillvalue, dtype=col.dtype) mask = col.notnull() N = mask.sum() if na_position == 'last': result[:N] = col.loc[mask] elif na_position == 'first': result[-N:] = col.loc[mask] else: raise ValueError('na_position {!r} unrecognized'.format(na_position)) return result df = pd.read_table('data', sep='\s{2,}') print(df.apply(functools.partial(drop_and_roll, fillvalue=''))) 

gives

  Word Word2 Word3 0 Hello My Name Yellow Bee Hive 1 My Yellow Bee 2 Yellow Golden Gates 3 Golden 4 Yellow 
+3
source

Since you want the values ​​to move up, you need to create a new data frame

Let's start with -

  Word Word2 0 Hello NaN 1 My My Name 2 Yellow Yellow Bee 3 Golden Golden Gates 4 Yellow NaN 

The following method is used -

 def get_column_array(df, column): expected_length = len(df) current_array = df[column].dropna().values if len(current_array) < expected_length: current_array = np.append(current_array, [''] * (expected_length - len(current_array))) return current_array pd.DataFrame({column: get_column_array(df, column) for column in df.columns} 

Gives -

  Word Word2 0 Hello My Name 1 My Yellow Bee 2 Yellow Golden Gates 3 Golden 4 Yellow 

You can also edit an existing df with the same function -

 for column in df.columns: df[column] = get_column_array(df, column) 
+1
source

Source: https://habr.com/ru/post/1203005/


All Articles