Pandas merge does a weird job if different left and right df keys

I found the pandas merge method does a strange job if the key index on the left and right is different.

for instances, I define the left and right dataframes as follows

left_df

   0  1  2  3  4  5
0  1  2  1  2  3  4
1  2  3  2  3  4  5
2  1  2  3  4  5  6
3  2  2  4  5  6  7
4  2  3  5  6  7  8

right_df

   0  1  2  3  4  5
0  1  2  3  4  5  6
1  1  2  3  4  5  7
2  2  3  4  5  6  7
3  2  3  4  5  6  8

and combine work with several parameters,

pd.merge(left_df, right_df, how="inner", left_on = [0,1], right_on=[0,1], indicator=False)

The result will find as expected.

       0  1  2_x  3_x  4_x  5_x  2_y  3_y  4_y  5_y
    0  1  2    1    2    3    4    3    4    5    6
    1  1  2    1    2    3    4    3    4    5    7
    2  1  2    3    4    5    6    3    4    5    6
    3  1  2    3    4    5    6    3    4    5    7
    4  2  3    2    3    4    5    4    5    6    7
    5  2  3    2    3    4    5    4    5    6    8
    6  2  3    5    6    7    8    4    5    6    7
    7  2  3    5    6    7    8    4    5    6    8

But if I set the left_on and right_on parameters differently, the result becomes very strange, as shown below.

merge job with '1,2' left key index

pd.merge(left_df, right_df, how="inner", left_on = [1,2], right_on=[0,1], indicator=False)


   1  2  0_x  1_x  2_x  3_x  4_x  5_x  0_y  1_y  2_y  3_y  4_y  5_y
0  2  3    1    2    3    4    5    6    2    3    4    5    6    7
1  2  3    1    2    3    4    5    6    2    3    4    5    6    8

                ^    ^                   ^    ^
                 these columns are duplicated.

   0_x    1    2  3_x  4_x  5_x  2_y  3_y  4_y  5_y
0    1    2    3    4    5    6    4    5    6    7
1    1    2    3    4    5    6    4    5    6    8
this is what I expected. (keys of each df are removed.)

Is there any parameter or solution to the above weird job?

+4
source share
1 answer

I wondered what strange result I mentioned, so I share my own assumption in two cases.

  • ( )

, .

, .

left_df
   0  key0  key1  3  4  5
0  1     2     1  2  3  4
1  2     3     2  3  4  5
2  1     2     3  4  5  6
3  2     2     4  5  6  7
4  2     3     5  6  7  8

right_df
   key0  key1  2  3  4  5
0     1     2  3  4  5  6
1     1     2  3  4  5  7
2     2     3  4  5  6  7
3     2     3  4  5  6  8

result
   0  key0  key1  3_x  4_x  5_x  2  3_y  4_y  5_y
0  1     2     3    4    5    6  4    5    6    7
1  1     2     3    4    5    6  4    5    6    8

.

        key_entry = []
        for i in range(len([1,2])):
            key_entry.append('key' + str(i))

        left_rename_map = {}
        for i, each in zip([1,2], key_entry):
            left_rename_map[i] = each

        right_rename_map = {}
        for i, each in zip([0,1], key_entry):
            right_rename_map[i] = each

        df1 = df1.rename(columns=left_rename_map)
        df2 = df2.rename(columns=right_rename_map)

-, Pandas ( , ). , , Pandas , , , .

0

Source: https://habr.com/ru/post/1672310/


All Articles