Effectively updating NaN in pandas data frame from the value of the previous row and specific column

Question

Effectively updating NaN in pandas data frame from the value of the previous row and specific column

I have it pandas'DataFrame, it looks like this:

# Output 
#        A     B     C     D
# 0    3.0   6.0   7.0   4.0
# 1   42.0  44.0   1.0   3.0
# 2    4.0   2.0   3.0  62.0
# 3   90.0  83.0  53.0  23.0
# 4   22.0  23.0  24.0   NaN
# 5    5.0   2.0   5.0  34.0
# 6    NaN   NaN   NaN   NaN
# 7    NaN   NaN   NaN   NaN
# 8    2.0  12.0  65.0   1.0
# 9    5.0   7.0  32.0   7.0
# 10   2.0  13.0   6.0  12.0
# 11   NaN   NaN   NaN   NaN
# 12  23.0   NaN  23.0  34.0
# 13  61.0   NaN  63.0   3.0
# 14  32.0  43.0  12.0  76.0
# 15  24.0   2.0  34.0   2.0

What I would like to do is populate NaN with the earliest previous row value B. Besides the column D, on this line I would like to replace NaN with zeros.

I looked into ffill, fillna .. and it seems that I could not cope with this task.

My decision:

def fix_abc(row, column, df):

    # If the row/column value is null/nan 
    if pd.isnull( row[column] ):

        # Get the value of row[column] from the row before
        prior = row.name
        value = df[prior-1:prior]['B'].values[0]

        # If that values empty, go to the row before that
        while pd.isnull( value ) and prior >= 1 :
            prior = prior - 1
            value = df[prior-1:prior]['B'].values[0]

    else:
        value = row[column]

    return value 

df['A'] = df.apply( lambda x: fix_abc(x,'A',df), axis=1 )
df['B'] = df.apply( lambda x: fix_abc(x,'B',df), axis=1 )
df['C'] = df.apply( lambda x: fix_abc(x,'C',df), axis=1 )


def fix_d(x):
    if pd.isnull(x['D']):
        return 0
    return x

df['D'] = df.apply( lambda x: fix_d(x), axis=1 )

It seems like it's pretty inefficient and slow. So I wonder if there is a faster and more efficient way to do this.

Output example;

#        A     B     C     D
# 0    3.0   6.0   7.0   3.0
# 1   42.0  44.0   1.0  42.0
# 2    4.0   2.0   3.0   4.0
# 3   90.0  83.0  53.0  90.0
# 4   22.0  23.0  24.0   0.0
# 5    5.0   2.0   5.0   5.0
# 6    2.0   2.0   2.0   0.0
# 7    2.0   2.0   2.0   0.0
# 8    2.0  12.0  65.0   2.0
# 9    5.0   7.0  32.0   5.0
# 10   2.0  13.0   6.0   2.0
# 11  13.0  13.0  13.0   0.0
# 12  23.0  13.0  23.0  23.0
# 13  61.0  13.0  63.0  61.0
# 14  32.0  43.0  12.0  32.0
# 15  24.0   2.0  34.0  24.0

I dumped the code containing the data for the data frame into a python script ( here )

+4

python lambda pandas dataframe

Andreecrist May 21 '17 at 14:50

1

Stephen Rauch · Accepted Answer · 2017-05-21T15:04:53+0000

fillna . D 0. B pad. A C B, :

:

df['D'] = df.D.fillna(0)
df['B'] = df.B.fillna(method='pad')
df['A'] = df.A.fillna(df['B'])
df['C'] = df.C.fillna(df['B'])

:

df = pd.read_fwf(StringIO(u"""
       A     B     C     D
     3.0   6.0   7.0   4.0
    42.0  44.0   1.0   3.0
     4.0   2.0   3.0  62.0
    90.0  83.0  53.0  23.0
    22.0  23.0  24.0   NaN
     5.0   2.0   5.0  34.0
     NaN   NaN   NaN   NaN
     NaN   NaN   NaN   NaN
     2.0  12.0  65.0   1.0
     5.0   7.0  32.0   7.0
     2.0  13.0   6.0  12.0
     NaN   NaN   NaN   NaN
    23.0   NaN  23.0  34.0
    61.0   NaN  63.0   3.0
    32.0  43.0  12.0  76.0
    24.0   2.0  34.0   2.0"""), header=1)

print(df)

df['D'] = df.D.fillna(0)
df['B'] = df.B.fillna(method='pad')
df['A'] = df.A.fillna(df['B'])
df['C'] = df.C.fillna(df['B'])
print(df)

:

       A     B     C     D
0    3.0   6.0   7.0   4.0
1   42.0  44.0   1.0   3.0
2    4.0   2.0   3.0  62.0
3   90.0  83.0  53.0  23.0
4   22.0  23.0  24.0   NaN
5    5.0   2.0   5.0  34.0
6    NaN   NaN   NaN   NaN
7    NaN   NaN   NaN   NaN
8    2.0  12.0  65.0   1.0
9    5.0   7.0  32.0   7.0
10   2.0  13.0   6.0  12.0
11   NaN   NaN   NaN   NaN
12  23.0   NaN  23.0  34.0
13  61.0   NaN  63.0   3.0
14  32.0  43.0  12.0  76.0
15  24.0   2.0  34.0   2.0

       A     B     C     D
0    3.0   6.0   7.0   4.0
1   42.0  44.0   1.0   3.0
2    4.0   2.0   3.0  62.0
3   90.0  83.0  53.0  23.0
4   22.0  23.0  24.0   0.0
5    5.0   2.0   5.0  34.0
6    2.0   2.0   2.0   0.0
7    2.0   2.0   2.0   0.0
8    2.0  12.0  65.0   1.0
9    5.0   7.0  32.0   7.0
10   2.0  13.0   6.0  12.0
11  13.0  13.0  13.0   0.0
12  23.0  13.0  23.0  34.0
13  61.0  13.0  63.0   3.0
14  32.0  43.0  12.0  76.0
15  24.0   2.0  34.0   2.0

Effectively updating NaN in pandas data frame from the value of the previous row and specific column

More articles: