I am trying to combine two data frames in pandas with a common column name (orderid). The resulting data frame (combined data block) discards the order from the second data frame. In the documentation, the 'on' column should be kept unless you explicitly specify this.
import pandas as pd df = pd.DataFrame([[1,'a'], [2, 'b'], [3, 'c']], columns=['orderid', 'ordervalue']) df['orderid'] = df['orderid'].astype(str) df2 = pd.DataFrame([[1,200], [2, 300], [3, 400], [4,500]], columns=['orderid', 'ordervalue']) df2['orderid'] = df2['orderid'].astype(str) pd.merge(df, df2, on='orderid', how='outer', copy=True, suffixes=('_left', '_right'))
What deduces this:
| |orderid | ordervalue_left | ordervalue_right | |------|--------|-----------------|------------------| | 0 | 1 | a | 200 | | 1 | 2 | b | 300 | | 2 | 3 | c | 400 | | 3 | 4 | | 500 |
What I'm trying to create is the following:
| | orderid_left | ordervalue_left | orderid_left | ordervalue_right | |------|--------------|-----------------|--------------|------------------| | 0 | 1 | a | 1 | 200 | | 1 | 2 | b | 2 | 300 | | 2 | 3 | c | 3 | 400 | | 3 | NaN | NaN | 4 | 500 |
How do I write this?
source share