Pandas hierarchical data - copy data for all "child" values ​​that have the same "parent" value

I am trying to figure out a way to transfer data, in this case user information ('email_address', 'name') for all sub-companies ('companyID') that belong to the same parent company ('parent_companyID').

My sample DataFrame:

In [1]: sample_data = pd.DataFrame(
{'companyID': {0: 112, 1: 223, 2: 434, 3: 777, 4: 790},
 'email_address': {0: '112email@gmail.com',  1: '', 2: '434email@gmail.com', 3: '777email@gmail.com', 4: ''},
 'name': {0: 'Joe', 1: '', 2: '', 3: '', 4: 'George'},
 'parent_companyID': {0: 555, 1: 555, 2: 555, 3: 999, 4: 999}}
)

or for better readability:

    companyID   email_address        name    parent_companyID
0   112         112email@gmail.com   Joe         555
1   223                                          555
2   434         434email@gmail.com               555
3   777         777email@gmail.com               999
4   790                              George      999

I searched many times and did not seem to find a similar question that helped me solve this problem. I made a lot of hits by doing this through MultiIndex, but did not achieve anything close to the desired result, which:

    companyID   email_address        name    parent_companyID
0   112         112email@gmail.com   Joe         555
1   112                                          555
2   112         434email@gmail.com               555
3   223         112email@gmail.com   Joe         555
4   223                                          555
5   223         434email@gmail.com               555
6   434         112email@gmail.com   Joe         555
7   434                                          555    
8   434         434email@gmail.com               555
9   777         777email@gmail.com               999
10  777                              George      999
11  790         777email@gmail.com               999
12  790                              George      999

- , , , . , , , . , , , . , , ! ...

+4
1

Self Merge!

  • pd.merge
  • , 'companyID', ,
  • 'companyID' ... , ' ' . , str.strip
  • , , .

on = 'parent_companyID'
mrg = sample_data.merge(sample_data, on=on, suffixes=[' ', ''])
cols = sample_data.columns.tolist()
cols.remove('companyID')
cols.insert(0, 'companyID ')
mrg[cols].rename(columns=str.strip)

    companyID       email_address     name  parent_companyID
0         112  112email@gmail.com  112Name               555
1         112                                            555
2         112  434email@gmail.com                        555
3         223  112email@gmail.com  112Name               555
4         223                                            555
5         223  434email@gmail.com                        555
6         434  112email@gmail.com  112Name               555
7         434                                            555
8         434  434email@gmail.com                        555
9         777  777email@gmail.com                        999
10        777                      790Name               999
11        790  777email@gmail.com                        999
12        790                      790Name               999
+1

Source: https://habr.com/ru/post/1677580/


All Articles