How to combine two data frames excluding the NaN value column?

if df1:

       size_a  size_b
0       1       2
1       1       5
2       2       3
3       2       9
4       3       1
5       3       5
6       4       4

and df2:

   size_a  size_b
0     1     2
1     2     NaN
2     3     NaN

I want the result to be as follows:

  size_a size_b
0       1       2
1       2       3
2       2       9
3       3       1
4       3       5

To make the intersection, I want to consider only non-nan-values ​​of df2-, where NaN in df2 ever exists, the column value should be ignored to perform the intersection.

+4
source share
3 answers

One way is to first join columns (columns) that require joining without substitution. This will help reduce the conditional filters that you have to build downstream. In the above example, I see that size_ais one of the following columns:

new_df = df1.merge(df2, how='inner', on='size_a')

, df2 NaN.

new_df = new_df[(new_df['size_b_x'] == new_df['size_b_y']) | new_df['size_b_y'].isnull()]

, df2 ( _y )

new_df = new_df.drop('size_b_y', 1)
+2

, merge concat :

. merge:

part1 = pd.merge(df1, df2)

. NaN s:

nans = df2[df2.size_b.isnull()]
part2 = pd.merge(df1, nans[["size_a"]], on="size_a")

. concat

pd.concat([part1, part2], ignore_index=True)

:

   size_a size_b
0       1      2
1       2      3
2       2      9
3       3      1
4       3      5
+3

, , , .

df_out = df1.merge(df2, on='size_a',suffixes=('','_y'))

df_out.query('size_b_y == size_b or size_b_y != size_b_y').drop('size_b_y',axis=1)

:

   size_a  size_b
0       1       2
2       2       3
3       2       9
4       3       1
5       3       5

: size_by_y!= size_b_y - , NaN.

+2

Source: https://habr.com/ru/post/1683175/


All Articles