R: how to make grep return a match, not the whole string

I have something that is probably really dumb grep in question R. Sorry, because it seems like it should be so simple - I obviously just missed something.

I have a row vector, let's call it alice . The following is part of alice :

 T.8EFF.SP.OT1.D5.VSVOVA#4 T.8EFF.SP.OT1.D6.LISOVA#1 T.8EFF.SP.OT1.D6.LISOVA#2 T.8EFF.SP.OT1.D6.LISOVA#3 T.8EFF.SP.OT1.D6.VSVOVA#4 T.8EFF.SP.OT1.D8.VSVOVA#3 T.8EFF.SP.OT1.D8.VSVOVA#4 T.8MEM.SP#1 T.8MEM.SP#3 T.8MEM.SP.OT1.D106.VSVOVA#2 T.8MEM.SP.OT1.D45.LISOVA#1 T.8MEM.SP.OT1.D45.LISOVA#3 

I want grep to give me the number after D that appears on some of these lines, provided that the line contains "LIS" and an empty line or something like that.

I was hoping grep would return me the value of the capture group, not the whole string. Here is my R-flavored regex:

 pattern <- (?<=\\.D)([0-9]+)(?=.LIS) 

nothing complicated. But to get what I need, instead of just using grep(pattern, alice, value = TRUE, perl = TRUE) , I do the following, which seems bad:

 reg.out <- regexpr( "(?<=\\.D)[0-9]+(?=.LIS)", alice, perl=TRUE ) substr(alice,reg.out,reg.out + attr(reg.out,"match.length")-1) 

Looking at it now, it doesn't seem too ugly, but the amount of unrest taken to get this completely trivial job was awkward. Any of the pointers on how to do this correctly?

Bonus signs to point to a web page that explains the difference between what I get with $ , @ and attr .

+41
grep r
Jun 03 '10 at 19:58
source share
2 answers

You can do something like this:

 pat <- ".*\\.D([0-9]+)\\.LIS.*" sub(pat, "\\1", alice) 

If you only want a subset of alice where your template is located, try the following:

 pat <- ".*\\.D([0-9]+)\\.LIS.*" sub(pat, "\\1", alice[grepl(pat, alice)]) 
+34
Jun 03 2018-10-06T00:
source share
— -

Try the stringr package:

 library(stringr) str_match(alice, ".*\\.D([0-9]+)\\.LIS.*")[, 2] 
+44
Jun 03 2018-10-06T00:
source share



All Articles