How to implement sql coalesce in pandas

Question

How to implement sql coalesce in pandas

I have a data frame like

df = pd.DataFrame({"A":[1,2,np.nan],"B":[np.nan,10,np.nan], "C":[5,10,7]})
     A     B   C
0  1.0   NaN   5
1  2.0  10.0  10
2  NaN   NaN   7

I want to add a new column "D". Expected Result:

     A     B   C    D
0  1.0   NaN   5    1.0
1  2.0  10.0  10    2.0
2  NaN   NaN   7    7.0

Thanks in advance!

+4

python pandas

Anoop Apr 3 '17 at 6:18

source share

3 answers

jezrael · Answer 1 · 2017-04-03T06:22:14+0000

I think you need bfillto select the first column iloc:

df['D'] = df.bfill(axis=1).iloc[:,0]
print (df)
     A     B   C    D
0  1.0   NaN   5  1.0
1  2.0  10.0  10  2.0
2  NaN   NaN   7  7.0

same as:

df['D'] = df.fillna(method='bfill',axis=1).iloc[:,0]
print (df)
     A     B   C    D
0  1.0   NaN   5  1.0
1  2.0  10.0  10  2.0
2  NaN   NaN   7  7.0

piRSquared · Answer 2 · 2017-04-03T06:24:13+0000

option 1
pandas

df.assign(D=df.lookup(df.index, df.isnull().idxmin(1)))

     A     B   C    D
0  1.0   NaN   5  1.0
1  2.0  10.0  10  2.0
2  NaN   NaN   7  7.0

option 2
numpy

v = df.values
j = np.isnan(v).argmin(1)
df.assign(D=v[np.arange(len(v)), j])

     A     B   C    D
0  1.0   NaN   5  1.0
1  2.0  10.0  10  2.0
2  NaN   NaN   7  7.0

naive time test
according to data

according to larger data

philshem · Answer 3 · 2017-04-03T09:02:18+0000

Another way is to explicitly populate column D with A, B, C in that order.

df['D'] = np.nan
df['D'] = df.D.fillna(df.A).fillna(df.B).fillna(df.C)

How to implement sql coalesce in pandas

More articles: