Finding the best match for strings with R

From this L Hernandez

From a vector containing the following:

[1] "HernandezOlaf "    "HernandezLuciano " "HernandezAdrian "

I tried this:

'subset(ABC, str_detect(ABC, "L Hernandez") == TRUE)'

The name Hernandez, which includes capital L anyplace, is the desired result.

Desired Result: HernandezLuciano

+4
source share
3 answers

Perhaps this will help:

vec1 <- c("L Hernandez", "HernandezOlaf ","HernandezLuciano ", "HernandezAdrian ")
grep("L ?Hernandez|Hernandez ?L",vec1,value=T)
#[1] "L Hernandez" "HernandezLuciano "

Update

variable <- "L Hernandez"

v1 <- gsub(" ", " ?", variable) #replace space with a space and question mark 
v2 <- gsub("([[:alpha:]]+) ([[:alpha:]]+)", "\\2 ?\\1", variable) #reverse the order of words in the string and add question mark

You can also use @rawr strsplitto separate variableby comments

grep(paste(v1,v2, sep="|"), vec1,value=T)
#[1] "L Hernandez"       "HernandezLuciano "
+2
source

You can use the function agrepto approximate string matching. If you just run this function, it corresponds to each line ...

agrep("L Hernandez", c("HernandezOlaf ",    "HernandezLuciano ", "HernandezAdrian "))
[1] 1 2 3

but if you change it a little "L Hernandez" → "Hernandez L"

agrep("Hernandez L", c("HernandezOlaf ",    "HernandezLuciano ", "HernandezAdrian "))
[1] 1 2 3

agrep("Hernandez L", c("HernandezOlaf ",    "HernandezLuciano ", "HernandezAdrian "),0.01)
[1] 2

. , :)

0

You can change the following if you only need full names after capital L:

vec1[grepl("Hernandez", vec1) & grepl("L\\.*", vec1)]
[1] "L Hernandez"       "HernandezLuciano

or

vec1[grepl("Hernandez", vec1) & grepl("L[[:alpha:]]", vec1)]
[1] "HernandezLuciano "

The expression looks for a match on "Hernandez" and then looks to see if there is capital "L" followed by any character or space. The second version requires a letter after the capital "L".

By the way, you cannot bundle grepls.

vec1[grepl("Hernandez", vec1) & grepl("L\\[[:alpha:]]", vec1)]
character(0)
0
source

Source: https://habr.com/ru/post/1547287/


All Articles