Search for a string in a variable and return the matched string

I need help to match multiple rows stored in a vector with an address stored in a column of a data frame (data.table). My database is quite large with about 1 million records, and therefore I prefer to use data.table.

Below is a dummy sample of data and vector -

my <- data.frame(add=c("50, nutan nagar Mum41","50, nutan Mum88 Maha","77, amar nagar Blr79 Bang","54, veer build Chennai3242","amar 755 Blr 400018"))

vec1 <- c("Mum","Blr","Chennai")

I need to find every line from vec1 with every address in my add variable . If the variable finds any row from vec1 in the address, it should return the matched row in the new result column . In the event of a multiple match, it must return the 1st matching value, that is, substitute it with "Mom" and "Blr" both in the same address and in the return of "Mom".

Based on the dummy data, the expected result will be -

my$result <- c("Mum","Mum","Blr","Chennai","Blr")

I tried using grep / grepl, but they give the error "argument" pattern "has a length> 1, and only the first element will be used"

str_match, TRUE/FALSE , , .

?

+4
2

str_extract

library(stringr)
str_extract(my$add, paste(vec1, collapse="|"))
#[1] "Mum"     "Mum"     "Blr"     "Chennai" "Blr"   

base R

regmatches(my$add, regexpr(paste(vec1, collapse="|"), my$add))
#[1] "Mum"     "Mum"     "Blr"     "Chennai" "Blr"    
+5

R:

vec1[sapply(as.data.frame(do.call(rbind,lapply(vec1, 
        function(x) {grepl(x,my$add)}))), function(y) {min(which(y))})]

:

[1] "Mum"     "Mum"     "Blr"     "Chennai" "Blr"   

, !

+3

Source: https://habr.com/ru/post/1682969/


All Articles