Attach two DataFrames to one key column / ERROR: "columns overlap but no suffix specified"

I have two tables: a sales table and a product table, and these two tables separate the “FREQUENCY NUMBER” column. The column "PARTS NUMBER" in the sales table is not unique, but it is unique in the product table. (see image below snapshot of sales table and product table)

enter image description here

enter image description here

I tried to add the equivalent “Description” to each “PARTIAL NUMBER” in the sales table, and I followed examples from the pandas website my code

sales.join(part_table, on='PART NUMBER') 

But I got this error:

 ValueError: columns overlap but no suffix specified: Index([u'PART NUMBER'], dtype='object') 

Can someone explain what this error means and how to solve it?

Many thanks!

+5
source share
1 answer

I think you want to make merge , not join :

 sales.merge(part_table) 

Here's an example frame:

 In [11]: dfa = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B']) In [12]: dfb = pd.DataFrame([[1, 'a'], [3, 'b'], [3, 'c']], columns=['A', 'C']) In [13]: dfa.join(dfb, on=['A']) ValueError: columns overlap but no suffix specified: Index([u'A'], dtype='object') In [14]: dfa.merge(dfb) Out[14]: ABC 0 1 2 a 1 3 4 b 2 3 4 c 

It is not clear from the docs if this is intentional (I thought that on would be used as a column), but after reporting the exceptions, if you add suffixes, we will see what happens:

 In [21]: dfb.join(dfa, on=['A'], lsuffix='_a', rsuffix='_b') Out[21]: A_a C A_b B 0 1 a 3 4 1 3 b NaN NaN 2 3 c NaN NaN In [22]: dfb.join(dfa, lsuffix='_a', rsuffix='_b') Out[22]: A_a C A_b B 0 1 a 1 2 1 3 b 3 4 2 3 c NaN NaN 

It ignores on kwarg and just makes the connection.

+8
source

Source: https://habr.com/ru/post/1203341/


All Articles