R Regexp - 5-digit extraction number

Question

R Regexp - 5-digit extraction number

I have a line a similar to this:

stundenwerte_FF_00691_19260101_20131231_hist.zip

and I’d like to extract the 5-digit number “00691” from it.

I tried using gregexpr and regmatches as well as stringr::str_extract but couldn't figure out the correct rexexp. I got to:

gregexpr ("[: digits {5}:]", a)

Which should return 5 digit numbers, and I don’t understand how to fix it.
This does not work: (

 m <- gregexpr("[:digits{5}:]",a) regmatches(a,m)

Thanks for your help in advance!

+5

regex r

Rentrop Oct 25 '14 at 2:27

source share

4 answers

1) sub

 sub(".*_(\\d{5})_.*", "\\1", x) ## [1] "00691"

2) gsubfn :: strapplyc The regular expression can be slightly simplified if we use strapplyc :

 library(gsubfn) strapplyc(x, "_(\\d{5})_", simplify = TRUE) ## [1] "00691"

3) strsplit If we know that this is the third field:

 read.table(text = x, sep = "_", colClasses = "character")$V3 ## [1] "00691"

3a) or

 strsplit(x, "_")[[1]][3] ## [1] "00691"

+5

G. grothendieck Oct 25 '14 at 2:47

source share

You can try the following regex that uses negative search statements. We cannot use word boundaries here like \\b\\d{5}\\b , because the previous and next _ characters fall under \w

 > x <- "stundenwerte_FF_00691_19260101_20131231_hist.zip" > m <- regexpr("(?<!\\d)\\d{5}(?!\\d)", x, perl=TRUE) > regmatches(x, m) [1] "00691" > m <- gregexpr("(?<!\\d)\\d{5}(?!\\d)", x, perl=TRUE) > regmatches(x, m)[[1]] [1] "00691"

Explanation:

(?<!\\d) A negative lookbehind states that the preceding match will be any, but not a number.
\\d{5} Match exactly 5 digits.
(?!\\d) A negative look says that the character following the match will be any, but not a number.

+4

Avinash raj Oct 25 '14 at 2:29

source share

Let the line be:

 ss ="stundenwerte_FF_00691_19260101_20131231_hist.zip"

You can split the string and list the substrings:

 ll = unlist(strsplit(ss,'_'))

Then get the indices of the substrings set to TRUE if they are 5 characters long:

 idx = sapply(ll, nchar)==5

And get those that are 5 characters long:

 ll[idx] [1] "00691"

+1

rnso Oct 25 '14 at 5:06

source share

hwnd · Accepted Answer · 2014-10-25T02:31:12+0000

You can just use sub to capture numbers, for this simple case, no IMO regmatches .

 x <- 'stundenwerte_FF_00691_19260101_20131231_hist.zip' sub('\\D*(\\d{5}).*', '\\1', x) # [1] "00691"

Edit: If you have other lines containing numbers in front, you will slightly modify the expression.

 sub('.*_(\\d{5})_.*', '\\1', x)

R Regexp - 5-digit extraction number

More articles: