Extract a string of words between two specific words in R

Question

Extract a string of words between two specific words in R

I have the following line: "PRODUCT colgate good but not goodOKAY"

I want to extract all words between PRODUCT and OKAY

+15

regex r

gyaanseeker Feb 01 '15 at 20:19

source share

5 answers

G. grothendieck · Answer 1 · 2015-02-01T22:45:20+0000

This can be done using sub :

 s <- "PRODUCT colgate good but not goodOKAY" sub(".*PRODUCT *(.*?) *OKAY.*", "\\1", s)

giving:

 [1] "colgate good but not good"

No packages.

Here is a regular expression visualization:

 .*PRODUCT *(.*?) *OKAY.*

Regular expression visualization

Demo version of Debuggex

Gregor · Answer 2 · 2015-02-01T20:30:40+0000

 x = "PRODUCT colgate good but not goodOKAY" library(stringr) str_extract(string = x, pattern = perl("(?<=PRODUCT).*(?=OKAY)"))

(?<=PRODUCT) - Take a look behind PRODUCT

.* matches all but newlines.

(?=OKAY) - Look forward to match OKAY .

I must add that you do not need the stingr package for this, the basic functions of sub and gsub work fine. I use stringr for syntax consistency: I retrieve, replace, discover, etc. Function names are predictable and understandable, and the arguments are in sequential order. I use stringr because it saves me having to go to the documentation every time.

Sven hohenstein · Answer 3 · 2015-02-01T20:26:32+0000

You can use gsub :

 vec <- "PRODUCT colgate good but not goodOKAY" gsub(".*PRODUCT\\s*|OKAY.*", "", vec) # [1] "colgate good but not good"

Tyler rinker · Answer 4 · 2015-02-02T03:39:54+0000

You can use the rm_between function from the qdapRegex package. It takes the line and the left and right borders as follows:

 x <- "PRODUCT colgate good but not goodOKAY" library(qdapRegex) rm_between(x, "PRODUCT", "OKAY", extract=TRUE) ## [[1]] ## [1] "colgate good but not good"

Moody_Mudskipper · Answer 5 · 2019-10-08T17:13:52+0000

You can use unglue package:

 library(unglue) x <- "PRODUCT colgate good but not goodOKAY" unglue_vec(x, "PRODUCT {out}OKAY") #> [1] "colgate good but not good"

Extract a string of words between two specific words in R

More articles: