There are many possible regular expressions for this. Here is one of them:
x=c("East Kootenay C (5901035) RDA 01011","Thompson-Nicola J (Copper Desert Country) (5933039) RDA 02020") > gsub('.+\\(([0-9]+)\\).+?$', '\\1', x) [1] "5901035" "5933039"
Let's decompose the syntax of this first expression '.+\\(([0-9]+)\\).+'
.+ one or more things\\( parentheses are special characters in the regular expression, so if I want to represent the actual thing ( I need to escape it with \ . I have to avoid this again for R (hence two \ s).
([0-9]+) I mentioned special characters, here I use two. the first is the brackets that indicate the group that I want to keep. The second group [ and ] surrounds groups of things. see ?regex for more information.
?$ The last part ensures that I grab the LAST set of numbers in parens, as noted in the comments.
I could also use * instead . which would mean 0 or more, not one or more i if your parn line appears at the beginning or end of the line.
The second part of gsub is that I replace the first part. I used: \\1 . This suggests using group 1 (the material inside ( ) on top. I need to remove it twice, once for a regular expression and once for R.
Clear as dirt to be sure! Enjoy the data collection project!
source share