Pandas - merge with missing values

There seems to be a quirk with the pandas merge function. He believes that NaN values ​​are equal, and will combine NaN with other NaN s:

 >>> foo = DataFrame([ ['a',1,2], ['b',4,5], ['c',7,8], [np.NaN,10,11] ], columns=['id','x','y']) >>> bar = DataFrame([ ['a',3], ['c',9], [np.NaN,12] ], columns=['id','z']) >>> pd.merge(foo, bar, how='left', on='id') Out[428]: id xyz 0 a 1 2 3 1 b 4 5 NaN 2 c 7 8 9 3 NaN 10 11 12 [4 rows x 4 columns] 

This is not like any RDB I've seen, usually missing values ​​are processed by agnosticism and do not merge together as if they were equal. This is especially problematic for data sets with sparse data (each NaN will be merged with every other NaN, which will lead to a huge DataFrame!)

Is there a way to ignore missing values ​​during merge without cutting them?

+6
source share
3 answers

You can exclude values ​​from bar (and indeed foo if you want), where id is null during merge. Not sure what you are after though how they are cut off.

(I assumed from your left join that you are interested in saving all of foo , but want to combine only those parts of bar that don't match).

 foo.merge(bar[pd.notnull(bar.id)], how='left', on='id') Out[11]: id xyz 0 a 1 2 3 1 b 4 5 NaN 2 c 7 8 9 3 NaN 10 11 NaN 
+3
source

if you do not need NaN in left and right DF, use

pd.merge(foo.dropna(), bar.dropna(), how='left', on='id')

else, if you need NaN in the left DF, use

 pd.merge(foo, bar.dropna(), how='left', on='id') 
+1
source

If you want to save NaN from both tables without cutting them, you can use the external join method as follows:

 pd.merge(foo, bar.dropna(), how='outer', on='id') 

Basically, it returns the union of foo and bar

+1
source

Source: https://habr.com/ru/post/970047/


All Articles