Negation in R, how can I replace words after negation in R?

I answer the question that was asked here about how to add the prefix "not_" to the word following the negation.

In the comments MrFlick suggested a solution using regex gsub("(?<=(?:\\bnot|n't) )(\\w+)\\b", "not_\\1", x, perl=T).

I would like to edit this regex to add the not_ prefix to all words following "not" or "not" until there is some kind of punctuation.

If I edit the cptn example, I would like to:

x <- "They didn't sell the company, and it went bankrupt" 

To convert to:

"They didn't not_sell not_the not_company, and it went bankrupt"

Can using backreference still do the trick here? If so, then any example would be greatly appreciated. Thank!

+4
source share
3 answers

you can use

(?:\bnot|n't|\G(?!\A))\s+\K(\w+)\b

not_\1. . regex demo.

  • (?:\bnot|n't|\G(?!\A)) - :
    • \bnot - not
    • n't - n't
    • \G(?!\A) -
  • \s+ - 1+
  • \K - reset, ,
  • (\w+) - 1 ( \1 ): 1 + - (, _)
  • \b - .

R demo:

x <- "They didn't sell the company, and it went bankrupt"
gsub("(?:\\bnot|n't|\\G(?!\\A))\\s+\\K(\\w+)\\b", "not_\\1", x, perl=TRUE)
## => [1] "They didn't not_sell not_the not_company, and it went bankrupt"
+1

, . :

x <- "They didn't sell the company, and it went bankrupt. Then something else"
x_split <- strsplit(x, split = "[,.]") 
[[1]]
[1] "They didn't sell the company" " and it went bankrupt"        " Then something else" 

x_split. , ( ).

0

, :

x <- "They didn't sell the company, and it did not go bankrupt. That it" 

gsub("((^|[[:punct:]]).*?(not|n't)|[[:punct:]].*?((?<=\\s)[[:punct:]]|$))(*SKIP)(*FAIL)|\\s", 
     " not_", x, 
     perl = TRUE)

# [1] "They didn't not_sell not_the not_company, and it did not not_go not_bankrupt. That it"

:

This uses a trick (*SKIP)(*FAIL)to avoid any pattern that you do not want to use for regular expression. This basically replaces every space not_except for those spaces where they are between:

  • Start of line or punctuation and "not"or "n't"or

  • Punctuation and punctuation (not followed by a space) or end of line

0
source

Source: https://habr.com/ru/post/1690615/


All Articles