Pandas: a merged (internal bit) data frame has more rows than the original

I am using python 3.4 in a Jupyter Notebook trying to combine two data frames as shown below:

df_A.shape
(204479, 2)

df_B.shape
(178, 3)

new_df = pd.merge(df_A, df_B,  how='inner', on='my_icon_number')
new_df.shape
(266788, 4)

I thought that the merged above new_dfshould have several lines than df_A, since merging is like an inner join. But why new_dfactually has more lines than df_A?

Here is what I really want:

my df_Alooks like this:

 id           my_icon_number
-----------------------------
 A1             123             
 B1             234
 C1             123
 D1             235
 E1             235
 F1             400

and mine df_Blooks like this:

my_icon_number    color      size
-------------------------------------
  123              blue      small
  234              red       large 
  235              yellow    medium

Then I want to new_dfbe:

 id           my_icon_number     color       size
--------------------------------------------------
 A1             123              blue        small
 B1             234              red         large
 C1             123              blue        small
 D1             235              yellow      medium
 E1             235              yellow      medium

I really don't want to remove duplicates of my_icon_number in df_A. Any idea what I missed here?

+4
source share
1

, k * m , k - 1 m - 2.

drop_duplicates

dfa = df_A.drop_duplicates(subset=['my_icon_number'])
dfb = df_B.drop_duplicates(subset=['my_icon_number'])

new_df = pd.merge(dfa, dfb, how='inner', on='my_icon_number')

- 4, 3 . , 9 , .

df_A = pd.DataFrame(dict(my_icon_number=[1, 2, 3, 4, 4, 4], other_column1=range(6)))
df_B = pd.DataFrame(dict(my_icon_number=[4, 4, 4, 5, 6, 7], other_column2=range(6)))

pd.merge(df_A, df_B,  how='inner', on='my_icon_number')

   my_icon_number  other_column1  other_column2
0               4              3              0
1               4              3              1
2               4              3              2
3               4              4              0
4               4              4              1
5               4              4              2
6               4              5              0
7               4              5              1
8               4              5              2
+4

Source: https://habr.com/ru/post/1666438/


All Articles