R to extract a substring from the end of the pattern to the first occurrence of the character

Fighting for a few hours to get this match, and replace it in R gsubwith a job and still fail. I am trying to match a pattern "Reason:"in a string and give everything AFTER this pattern and until the first dot ( .) appears For example:

Offer Disposition. MSISDN: 7183067962. Offer: . Disposition: DECLINED. Reason: Not interested. ChannelID: CARE.

will return "Not interested"

+4
source share
4 answers

Here's the solution:

s <- "Offer Disposition. MSISDN: 7183067962. Offer: . Disposition: DECLINED. Reason: Not interested. ChannelID: CARE."

sub(".*Reason: (.*?)\\..*", "\\1", s)
# [1] "Not interested"

Update (for comments):

If you also have lines that do not match the pattern, I recommend using regexprinstead sub:

s2 <- c("no match example",
        "Offer Disposition. MSISDN: 7183067962. Offer: . Disposition: DECLINED. Reason: Not interested. ChannelID: CARE.")

match <- regexpr("(?<=Reason: ).*?(?=\\.)", s2, perl = TRUE)
ifelse(match == -1, NA, regmatches(s2, match))
# [1] NA                                "Not interested. ChannelID: CARE"

For the second example, you can use the following regular expression:

s3 <- "Delete Payment Arrangement of type Proof of Payment for BAN : 907295267 on date 02/01/2014, from reason PAERR."

# a)
sub(".*type (.*?) for.*", "\\1", s3)
# [1] "Proof of Payment"

# b)
match <- regexpr("(?<=type ).*?(?= for)", s3, perl = TRUE)
ifelse(match == -1, NA, regmatches(s3, match))
# [1] "Proof of Payment"
+4
source

( ). stringr.

library(stringr)

rec <- "Offer Disposition. MSISDN: 7183067962. Offer: . Disposition: DECLINED. Reason: Not interested. ChannelID: CARE."
str_match(rec, "Reason: ([a-zA-Z0-9\ ]+)\\.")[2]
## [1] "Not interested"
+2

This will work:

x <- "Offer Disposition. MSISDN: 7183067962. Offer: . Disposition: DECLINED. Reason: Not interested. ChannelID: CARE."

library(qdap)
genXtract(x, "Reason:", ".")

##     Reason:  :  . 
## " Not interested" 
0
source

with regexepr and regmatics:

str <- "Offer Disposition. MSISDN: 7183067962. Offer: . Disposition: DECLINED. Reason: Not interested. ChannelID: CARE."
m<-regexpr("(?<=Reason: )[^.]+", str, perl=TRUE)
regmatches(str, m)
0
source

Source: https://habr.com/ru/post/1531958/


All Articles