Replace the value in the column with vlookup of another data frame only if that value exists

I want to overwrite the values df1.Namebased on the mapping table in (df2.Name1, df2.Name2). However, not all values ​​in df1.Nameexist indf2.Name1

df1:

Name
Alex
Maria 
Marias
Pandas
Coala

df2:

Name1   Name2
Alex    Alexs
Marias  Maria
Coala   Coalas

Expected Result:

Name
Alexs
Maria
Maria
Pandas
Coalas

I tried several solutions on the Internet, for example, using the "Map" function. Turning df2in the dictionary, I use df1.Name = df1.Name.map(Dictionary), but this will result in nanfor all values ​​not included df2, as shown below.

Name
Alexs
Maria
Maria
NAN
Coalas

I'm not sure how to use the IF statement to replace only those that exist in df2, and keep the rest according to df1. I also tried to create a function with operators if, but was a big failure.

How could I approach this problem?

+4
5

replace

df1.Name.replace(df2.set_index('Name1').Name2.to_dict())
Out[437]: 
0     Alexs
1     Maria
2     Maria
3    Pandas
4    Coalas
Name: Name, dtype: object
+4

replace

df1 = pd.DataFrame({'Name': ['Alex', 'Maria', 'Marias', 'Pandas', 'Coala']})
df2 = pd.DataFrame({'Name1': ['Alex', 'Marias', 'Coala'],
                    'Name2': ['Alexs', 'Maria', 'Coalas']})

# Create the dictionary from df2
d = {"Name": {k:v for k, v in zip(df2["Name1"], df2["Name2"])}}
# Suggestion from Wen to create the dictionary
# d = {"Name": df2.set_index('Name1').Name2.to_dict()}     

df1.replace(d)   # Use df1.replace(d, inplace=True) if you want this in place

    Name
0   Alexs
1   Maria
2   Maria
3   Pandas
4   Coalas

replace , , "Name" , .

{"Name": {old_1: new_1, old_2: new_2...}}  

- > "Name" , old_1 new_1. old_2 new_2 ..

. , Wen .

+3

Pandas map combine_first:

df1['Name'].map(df2.set_index('Name1')['Name2']).combine_first(df1['Name'])

:

0     Alexs
1     Maria
2     Maria
3    Pandas
4    Coalas
Name: Name, dtype: object
+3

merge:

In [27]: df1['Name'] = df1.merge(df2.rename(columns={'Name1':'Name'}), how='left') \
                          .ffill(axis=1)['Name2']

In [28]: df1
Out[28]:
     Name
0   Alexs
1   Maria
2   Maria
3  Pandas
4  Coalas
+3

Python dict.get()allows you to use the default option. Therefore, if you create a dict translation, then if the search is not found, it is easy to return the original value, for example:

The code:

translate = {x: y for x, y in df2[['Name1', 'Name2']].values}
new_names = [translate.get(x, x) for x in df1['Name']]

Security Code:

import pandas as pd

df1 = pd.DataFrame({'Name': ['Alex', 'Maria', 'Marias', 'Pandas', 'Coala']})
df2 = pd.DataFrame({'Name1': ['Alex', 'Marias', 'Coala'],
                    'Name2': ['Alexs', 'Maria', 'Coalas']})

print(df1)
print(df2)

translate = {x: y for x, y in df2[['Name1', 'Name2']].values}
print([translate.get(x, x) for x in df1['Name']])

Test results:

     Name
0    Alex
1   Maria
2  Marias
3  Pandas
4   Coala

    Name1   Name2
0    Alex   Alexs
1  Marias   Maria
2   Coala  Coalas

['Alexs', 'Maria', 'Maria', 'Pandas', 'Coalas']
+2
source

Source: https://habr.com/ru/post/1692228/


All Articles