Extract text in parentheses in R

Question

Extract text in parentheses in R

Two related questions. I have text data vectors like

"a(b)jk(p)" "ipq" "e(ijkl)"

and you want to easily divide it into a vector containing the text OUTSIDE of parentheses:

 "ajk" "ipq" "e"

and a vector containing the text Insert parentheses:

 "bp" "" "ijkl"

Is there an easy way to do this? Another complication is that they can become quite large and have a large (unlimited) number of parentheses. Thus, I cannot just grab the text "pre / post" of parentheses and need a smarter solution.

+6

string text vector r stringr

user2817329 Mar 10 '15 at 2:53

source share

2 answers

rm_round function in rm_round package . I claim I was born to do this:

First we get and download the package through pacman

 if (!require("pacman")) install.packages("pacman") pacman::p_load(qdapRegex)

## Then we can use it to remove and extract the desired parts :

 x <-c("a(b)jk(p)", "ipq", "e(ijkl)") rm_round(x) ## [1] "ajk" "ipq" "e" rm_round(x, extract=TRUE) ## [[1]] ## [1] "b" "p" ## ## [[2]] ## [1] NA ## ## [[3]] ## [1] "ijkl"

To condense b and p use:

 sapply(rm_round(x, extract=TRUE), paste, collapse="") ## [1] "bp" "NA" "ijkl"

+5

Tyler rinker Mar 10 '15 at 4:44

source share

Avinash raj · Accepted Answer · 2015-03-10T03:50:11+0000

Text outside the bracket

 > x <- c("a(b)jk(p)" ,"ipq" , "e(ijkl)") > gsub("\\([^()]*\\)", "", x) [1] "ajk" "ipq" "e"

The text inside the parenthesis

 > x <- c("a(b)jk(p)" ,"ipq" , "e(ijkl)") > gsub("(?<=\\()[^()]*(?=\\))(*SKIP)(*F)|.", "", x, perl=T) [1] "bp" "" "ijkl"

(?<=\\()[^()]*(?=\\)) matches all characters that are present inside the brackets, and then (*SKIP)(*F) cause the match to fail. Now he is trying to execute the template that was immediately after the symbol | against the remaining line. So the point . matches all characters that are not yet missing. Replacing all matching characters with an empty string will give only the text present in the rockets.

 > gsub("\\(([^()]*)\\)|.", "\\1", x, perl=T) [1] "bp" "" "ijkl"

This regular expression will capture all characters that are in brackets, and matches all other characters. |. or a part helps to match all other characters other than captured ones. Therefore, replacing all the characters with the characters present within the group, index 1 will give you the desired result.

Extract text in parentheses in R

More articles: