I am using python 3.4 in a Jupyter Notebook trying to combine two data frames as shown below:
df_A.shape
(204479, 2)
df_B.shape
(178, 3)
new_df = pd.merge(df_A, df_B, how='inner', on='my_icon_number')
new_df.shape
(266788, 4)
I thought that the merged above new_dfshould have several lines than df_A, since merging is like an inner join. But why new_dfactually has more lines than df_A?
Here is what I really want:
my df_Alooks like this:
id my_icon_number
-----------------------------
A1 123
B1 234
C1 123
D1 235
E1 235
F1 400
and mine df_Blooks like this:
my_icon_number color size
-------------------------------------
123 blue small
234 red large
235 yellow medium
Then I want to new_dfbe:
id my_icon_number color size
--------------------------------------------------
A1 123 blue small
B1 234 red large
C1 123 blue small
D1 235 yellow medium
E1 235 yellow medium
I really don't want to remove duplicates of my_icon_number in df_A. Any idea what I missed here?
source
share