Matching and replacing many values ​​in data.table

I have a dataset with many erroneous records. I created two .csv columns, which includes the old (incorrect) names in one column and the corresponding new (correct) names in the second column. Now I need to say R to replace every old name in the data with the correct name.

testData = data.table(oldName = c("Nu York", "Was DC", "Buston",  "Nu York"))
replacements = data.table(oldName = c("Buston", "Nu York", "Was DC"), 
    newName = c("Boston", "New York", "Washington DC"))

    # The next line fails.
holder = replace(testData, testData[, oldName]==replacements[, oldName], 
    replacements[, newName]
+5
source share
2 answers

Here is how I would make this replacement:

setkey(testData, oldName)
setkey(replacements, oldName)

testData[replacements, oldName := newName]
testData
#         oldName
#1:        Boston
#2:      New York
#3:      New York
#4: Washington DC

You can add an index if you like the original order and return it in the original order at the end.

+7
source

. , setkey. .

library(data.table)

testData = data.table(
  city = c("Nu York", "Was DC", "Buston",  "Nu York", "Alabama")
)

:

replacements = data.table(
  city = c("Buston", "Nu York", "Was DC", "tstDummy"), 
  city_newName = c("Boston", "New York", "Washington DC", "Test Dummy")
)

testData[replacements, city := city_newName, on=.(city)][]

:

replacements = data.table(
  city_oldName = c("Buston", "Nu York", "Was DC", "tstDummy"), 
  city_newName = c("Boston", "New York", "Washington DC", "Test Dummy")
)

testData[replacements, city := city_newName, on=.(city = city_oldName)][]

, testData :

            city
1:      New York
2: Washington DC
3:        Boston
4:      New York
5:       Alabama

.

0

Source: https://habr.com/ru/post/1531358/


All Articles