Convert character strings to R

I need to combine with a data frame in R. Two data frames use a common identification variable, the name of the object. However, names in one data frame are partially capitalized, and in the other in lower case. In addition, the names are displayed in reverse order. Here is an example from data frames:

DataFrame1$Name:
"Van Brempt Kathleen"
"Gräßle Ingeborg"
"Gauzès Jean-Paul"
"Winkler Iuliu" 

DataFrame2$Name:
"Kathleen VAN BREMPT" 
"Ingeborg GRÄSSLE"
"Jean-Paul GAUZÈS"
"Iuliu WINKLER"

Is there a way in R that these two variables are used as an identifier for merging data frames?

Best Thomas

+3
source share
4 answers

Here's a complete solution that combines the two partial methods proposed so far (and overcomes the concerns expressed by Spacedman about “matching Grassle to Graßle”):

DataFrame2$revname <- gsub("([^\\s]*)\\s(.*)","\\2 \\1",DataFrame2$Name,perl=TRUE)
DataFrame2$agnum <-sapply(tolower(DataFrame2$revname), agrep, tolower(DataFrame1$Name) )
DataFrame1$num <-1:nrow(DataFrame1)
merge(DataFrame1, DataFrame2, by.x="num", by.y="agnum")

Conclusion:

  num              Name.x              Name.y             revname

1   1 Van Brempt Kathleen Kathleen VAN BREMPT VAN BREMPT Kathleen
2   2     Gräßle Ingeborg    Ingeborg GRÄSSLE    GRÄSSLE Ingeborg
3   3    Gauzès Jean-Paul    Jean-Paul GAUZÈS    GAUZÈS Jean-Paul
4   4       Winkler Iuliu       Iuliu WINKLER       WINKLER Iuliu

, DatFrame1 , - ( ). :

merge(DataFrame1, DataFrame2, by.x="row.names", by.y="agnum")

- .

+2

gsub :

> names
[1] "Kathleen VAN BREMPT" "jean-paul GAULTIER" 
> gsub("([^\\s]*)\\s(.*)","\\2 \\1",names,perl=TRUE)
[1] "VAN BREMPT Kathleen" "GAULTIER jean-paul" 
> 

, - , - . tolower() toupper(), , match() .

Grassle Graßle. , , , , , , -, !

Barry

+3

/ , :

DataFrame1$NameLower <- tolower(DataFrame1$Name)
DataFrame2$NameLower <- tolower(DataFrame2$Name)

:

MergedDataFrame <- merge(DataFrame1, DataFrame2, by="NameLower")
0

gsub agrep, . sapply, , :

> sapply( c('newyork', 'NEWJersey', 'Vormont'), agrep, x=state.name, ignore.case=TRUE )
  newyork NEWJersey   Vormont 
       32        30        45 
0

Source: https://habr.com/ru/post/1762042/


All Articles