Extract numbers between brackets inside a string

Question

Extract numbers between brackets inside a string

Possible duplicate:
Extract information in all brackets in R (regex)

I imported data from excel and one cell consists of these long strings that contain a number and letters, is there a way to extract only numbers from this row and save it in a new variable? Unfortunately, some of the entries have two sets of brackets, and I would like only the second? Can I use grep for this?

the lines look something like this: the length of the lines changes:

"East Kootenay C (5901035) RDA 01011"

or like this:

 "Thompson-Nicola J (Copper Desert Country) (5933039) RDA 02020"

All I want from this is 5901035 and 5933039

Any hints and help would be greatly appreciated.

+4

regex r

Sandra Oct 4 '12 at 20:21

source share

2 answers

Justin · Answer 1 · 2012-10-04T20:42:40+0000

There are many possible regular expressions for this. Here is one of them:

 x=c("East Kootenay C (5901035) RDA 01011","Thompson-Nicola J (Copper Desert Country) (5933039) RDA 02020") > gsub('.+\\(([0-9]+)\\).+?$', '\\1', x) [1] "5901035" "5933039"

Let's decompose the syntax of this first expression '.+\$([0-9]+)\$.+'

.+ one or more things
\\( parentheses are special characters in the regular expression, so if I want to represent the actual thing ( I need to escape it with \ . I have to avoid this again for R (hence two \ s).
([0-9]+) I mentioned special characters, here I use two. the first is the brackets that indicate the group that I want to keep. The second group [ and ] surrounds groups of things. see ?regex for more information.
?$ The last part ensures that I grab the LAST set of numbers in parens, as noted in the comments.

I could also use * instead . which would mean 0 or more, not one or more i if your parn line appears at the beginning or end of the line.

The second part of gsub is that I replace the first part. I used: \\1 . This suggests using group 1 (the material inside ( ) on top. I need to remove it twice, once for a regular expression and once for R.

Clear as dirt to be sure! Enjoy the data collection project!

G. grothendieck · Answer 2 · 2012-10-04T23:00:41+0000

Here is the gsubfn solution:

 library(gsubfn) strapplyc(x, "[(](\\d+)[)]", simplify = TRUE)

[(] corresponds to an open pair, (\\d+) corresponds to a string of digits creating a backward link due to the parentheses around it, and finally [)] corresponds to a close pair. Return link is returned.

Extract numbers between brackets inside a string

More articles: