There seems to be a quirk with the pandas merge function. He believes that NaN values ββare equal, and will combine NaN with other NaN s:
>>> foo = DataFrame([ ['a',1,2], ['b',4,5], ['c',7,8], [np.NaN,10,11] ], columns=['id','x','y']) >>> bar = DataFrame([ ['a',3], ['c',9], [np.NaN,12] ], columns=['id','z']) >>> pd.merge(foo, bar, how='left', on='id') Out[428]: id xyz 0 a 1 2 3 1 b 4 5 NaN 2 c 7 8 9 3 NaN 10 11 12 [4 rows x 4 columns]
This is not like any RDB I've seen, usually missing values ββare processed by agnosticism and do not merge together as if they were equal. This is especially problematic for data sets with sparse data (each NaN will be merged with every other NaN, which will lead to a huge DataFrame!)
Is there a way to ignore missing values ββduring merge without cutting them?
aensm source share