Groupby or Looping for conditional replacement

Relatively new to Python. I have a dataframe below nature

ID     DEPT     DOMAIN          
201606  271     GE
**201606  896     IR**
201608  271     GE
201609  271     GE
.....................            
...................           
**201701  896     FR**
201606  271     GE

I want to find all departments that have seen changes in their domain names since 2017 in ID.
And then I would like to replace the Domain (in rows 2016 *) with the Domain value that corresponds to the balance value of 2017 * rows
For example, in the above df example, I would like to replace the domain value 2016 * of rows for DEPT 896 with FR, which is the domain value of the corresponding Dept from lines 2017 *.

+4
source share
2 answers

It seems to me you need to first sort_values, and then use duplicatedfor the series last 2017and the last map+ fillna:

m1 = ~df.sort_values('ID').duplicated('DEPT', keep='last')
m2 = df['ID'].astype(str).str[:4] == '2017'
s = df[m1 & m2].set_index('DEPT')['DOMAIN']
df['DOMAIN'] = df['DEPT'].map(s).fillna(df['DOMAIN'])

print (df)
       ID  DEPT DOMAIN
0  201606   271     GE
1  201606   896     FR
2  201608   271     GE
3  201609   271     GE
4  201701   896     FR
5  201606   271     GE
+2

, groupby + transform. , , 2017, . groupby + transform, np.where.

g = df.groupby('DEPT')
i = g.DOMAIN.transform('last')
j = g.ID.transform('last').astype(str).str[:4] == '2017'

df.DOMAIN = np.where(j, i, df.DOMAIN)

df

       ID  DEPT DOMAIN
0  201606   271     GE
1  201606   896     FR
2  201608   271     GE
3  201609   271     GE
4  201701   896     FR
5  201606   271     GE
+2

Source: https://habr.com/ru/post/1693621/


All Articles