How to extract substring using inverse pattern using R?

I am trying to extract a substring from a pattern using the gsub () R function.

# Example: extracting "7 years" substring. string <- "Psychologist - 7 years on the website, online" gsub(pattern="[0-9]+\\s+\\w+", replacement="", string)

[1] "Psychologist - on the website, online"

As you can see, it is easy to exclude the desired substring using gsub (), but I need to invert the result and get only "7 years". I am thinking of using "^", something like this:

gsub(pattern="[^[0-9]+\\s+\\w+]", replacement="", string)

Please can someone help me with the correct regex pattern?

+4
source share
2 answers

you can use

sub(pattern=".*?([0-9]+\\s+\\w+).*", replacement="\\1", string)

See R this demo version .

More details

  • .*? - any characters 0+, as little as possible
  • ([0-9]+\\s+\\w+) - Capture group 1:
    • [0-9]+ - one or more digits
    • \\s+ - 1 or more spaces
    • \\w+ - 1 or more characters of the word
  • .* - the rest of the line (any characters 0+, as much as possible)

\1 1.

+5

\d, \d R:

string <- "Psychologist - 7 years on the website, online"
sub(pattern = "\\D*(\\d+\\s+\\w+).*", replacement = "\\1", string)
# [1] "7 years"

\D* : , , .

regex101.com.

+3

Source: https://habr.com/ru/post/1688204/


All Articles