How to save index when using pandas merge

I would like to combine the two DataFrames and save the index from the first frame as an index in the combined data set. However, when I do the merge, the resulting DataFrame has an integer index. How can I indicate that I want to save the index from the left data frame?

 In [4]: a = pd.DataFrame({'col1': {'a': 1, 'b': 2, 'c': 3}, 'to_merge_on': {'a': 1, 'b': 3, 'c': 4}}) In [5]: b = pd.DataFrame({'col2': {0: 1, 1: 2, 2: 3}, 'to_merge_on': {0: 1, 1: 3, 2: 5}}) In [6]: a Out[6]: col1 to_merge_on a 1 1 b 2 3 c 3 4 In [7]: b Out[7]: col2 to_merge_on 0 1 1 1 2 3 2 3 5 In [8]: a.merge(b, how='left') Out[8]: col1 to_merge_on col2 0 1 1 1.0 1 2 3 2.0 2 3 4 NaN In [9]: _.index Out[9]: Int64Index([0, 1, 2], dtype='int64') 

EDIT: switches to sample code that can be easily reproduced

+100
python pandas
Aug 15 '12 at 20:10
source share
4 answers
 In [5]: a.reset_index().merge(b, how="left").set_index('index') Out[5]: col1 to_merge_on col2 index a 1 1 1 b 2 3 2 c 3 4 NaN 

Note: for some merge operations on the left, you can get more rows if there are several matches between a and b and you need deduplication ( documentation for deduplication ). This is why pandas don't store an index for you.

+126
Aug 16 2018-12-12T00:
source share

There is a non-pd.merge solution. Using map and set_index

 In [1744]: a.assign(col2=a['to_merge_on'].map(b.set_index('to_merge_on')['col2'])) Out[1744]: col1 to_merge_on col2 a 1 1 1.0 b 2 3 2.0 c 3 4 NaN 

And does not introduce the dummy name index for the index.

+5
Sep 11 '17 at 17:33
source share
 df1 = df1.merge( df2, how="inner", left_index=True, right_index=True ) 

This allows you to save the df1 index

+2
Apr 26 '19 at 6:43
source share

You can make a copy of the index on the left data frame and perform a merge.

 a['copy_index'] = a.index a.merge(b, how='left') 

I found this simple method very useful when working with a large data frame and using pd.merge_asof() (or dd.merge_asof() ).

Combining two data frames with an index is efficient, and resetting the index is expensive in setting up large data frames.

+2
Jul 27 '19 at 21:12
source share



All Articles