Extract year number from string surrounded by special characters

What a good way to extract only 2007 number from the following line:

some_string <- "1_2_start_2007_3_end"

The sample for determining the year number in my case will be as follows:

  • 4 digits
  • surrounded by "_"

I am new to using regular expressions. I tried the following:

 regexp <- "_+[0-9]+_"
 names <- str_extract(files, regexp)

But this does not take into account that there are always 4 digits and displays underscores.

+4
source share
4 answers

We can use regex lookbehind to point _and extract 4 digits following

library(stringr)
str_extract(some_string, "(?<=_)\\d{4}")
#[1] "2007"

If the pattern also shows -both before and after 4 digits, then also use the regex expression

str_extract(some_string, "(?<=_)\\d{4}(?=_)")
#[1] "2007"
+4
source

You can also use the option sub:

some_string <- "1_2_start_2007_3_end"
sub(".*_(\\d{4})_.*", "\\1", some_string)

regex

  • .* - 0+,
  • _ - a _ char
  • (\\d{4}) - 1 ( \1 ): 4
  • _.* - a _, 0+ .

: akrun str_extract(some_string, "(?<=_)\\d{4}") , sub(".*_(\\d{4})_.*", "\\1", some_string) 4- , _. .: sub(".*?_(\\d{4})_.*", "\\1", some_string).

R test:

some_string <- "1_2018_start_2007_3_end"
sub(".*?_(\\d{4})_.*", "\\1", some_string) # leftmost
## -> 2018
sub(".*_(\\d{4})_.*", "\\1", some_string) # rightmost
## -> 2007
+5

, , _ . - NA, !is.na . nchar 4.

i1 <- as.numeric(strsplit(some_string, '_')[[1]])
i1 <- i1[!is.na(i1)]

i1[nchar(i1) == 4]
#[1] 2007
+2

, :

\S.*_(\d{4})_\S.*

It means,

  • any number of non-spatial characters,
  • then _
  • followed by four digits (d {4})
  • over four digits your year is captured using ()
  • another _
  • any other line without spaces

Since you mentioned that you are a beginner, please check this and all the other answers at https://regex101.com/ , it is very good to study regex, it explains in detail what your regex does.

If you just need (year), then under the regular expression is enough:

_(\d{4})_
0
source

Source: https://habr.com/ru/post/1693675/


All Articles