Extract string between multiple words using gsub

I am trying to isolate words from a string in R using -gsub-. I want to extract a name that can be found between "(" and "(m)" (for men) or between "(" and "(f)". I'm struggling to include in one line of code.

name<-c("Dr. T. (Tom) Bailey (m), UCL- Physics" , "Dr. B.K. (Barbara) Blue (f), Oxford - Political Science")

malename<-gsub(".*\\) (.*) \\(m).*", "\\1", name)
femname<-gsub(".*\\) (.*) \\(f).*", "\\1", name)

The above code gives me the names for men and women separately, but ideally I want to get their last name in one variable. This will be associated with some OR function (like (m) OR (f)), but I don't know how to enable it.

+4
source share
2 answers

m, f, (, POSIX, ): [mf].

".*\\)\\s+(.*)\\s+\\([mf]\\).*"
                     ^^^^

regex

sub, , (. -):

name<-c("Dr. T. (Tom) Bailey (m), UCL- Physics" , "Dr. B.K. (Barbara) Blue (f), Oxford - Political Science")
res <- sub(".*\\)\\s+(.*)\\s+\\([mf]\\).*", "\\1", name)
res
## => [1] "Bailey" "Blue"  
+5

sub

sub("^[^)]+\\)\\s+(\\w+).*", "\\1", name)
#[1] "Bailey" "Blue"  
+2

Source: https://habr.com/ru/post/1661431/


All Articles