Self-join with Pandas

I would like to do a self-join on the Pandas framework so that some lines are added to the original lines. Each line has an “i” token indicating which line should be attached to it to the right.

d = pd.DataFrame(['A','B','C'], columns = ['some_col']) d['i'] = [2,1,1] In [17]: d Out[17]: some_col i 0 A 2 1 B 1 2 C 1 

Required Conclusion:

  some_col i some_col_y 0 A 2 C 1 B 1 B 2 C 1 B 

That is, line 2 is added to line 0, line 1 to line 1, line 1 to line 2 (as indicated by i).

My idea of ​​how to do this was

 pd.merge(d, d, left_index = True, right_on = 'i', how = 'left') 

But it produces something completely different. How to do it right?

+5
source share
3 answers

join with on='i'

 d.join(d.drop('i', 1), on='i', rsuffix='_y') some_col i some_col_y 0 A 2 C 1 B 1 B 2 C 1 B 
+3
source

Instead of using merge you can also use indexing and assignment:

 >>> d['new_col'] = d['some_col'][d['i']].values >>> d some_col i new_col 0 A 2 C 1 B 1 B 2 C 1 B 
+4
source

Try the following:

 In [69]: d.join(d.set_index('i'), rsuffix='_y') Out[69]: some_col i some_col_y 0 A 2 NaN 1 B 1 B 1 B 1 C 2 C 1 A 

or

 In [64]: pd.merge(d[['some_col']], d, left_index=True, right_on='i', suffixes=['_y','']).sort_index() Out[64]: some_col_y some_col i 0 CA 2 1 BB 1 2 BC 1 
+1
source

Source: https://habr.com/ru/post/1262192/


All Articles