R grep and exact matches

Grep seems to be β€œgreedy” in the way it returns matches. Assuming I have the following data:

Sources <- c(
                "Coal burning plant",
                "General plant",
                "coalescent plantation",
                "Charcoal burning plant"
        )

Registry <- seq(from = 1100, to = 1103, by = 1)

df <- data.frame(Registry, Sources)

If I execute grep("(?=.*[Pp]lant)(?=.*[Cc]oal)", df$Sources, perl = TRUE, value = TRUE), it returns

"Coal burning plant"     
"coalescent plantation"  
"Charcoal burning plant" 

However, I only want to return an exact match, i.e. only where "coal" and "plant" occur. I do not want "coalescence", "plantation", etc. Therefore, for this I want to see only"Coal burning plant"

+4
source share
2 answers

\b . . , , - . (?i) .

grep('(?i)(?=.*\\bplant\\b)(?=.*\\bcoal\\b)', df$Sources, perl=T, value=T)

+7

"", "",

grep("\\b[Cc]oal\\b.*\\b[Pp]lant\\b", Sources, perl = TRUE, value=T)

\b, . ,

grep("(?=.*\\b[Pp]lant\\b)(?=.*\\b[Cc]oal\\b)", Sources, 
    perl = TRUE, value = TRUE)
+2

Source: https://habr.com/ru/post/1544678/


All Articles