Extract a numeric pattern from a string in R

I'm relatively new to regex and I'm at a dead end. I have a data frame with a column that looks like this:

year1
GMM14_2000_NGVA
GMM14_2001_NGVA
GMM14_2002_NGVA
...
GMM14_2014_NGVA

I am trying to extract a year in the middle of a line (2000,2001, etc.). This is my code so far

gsub("[^0-9]","",year1))

which returns a number, but also returns 14, which is part of the string:

142000
142001

Any idea on how to exclude 14 from the template or how to extract year information more efficiently?

thanks

+4
source share
5 answers

Use the following gsub:

s  = "GMM14_2002_NGVA"
gsub("^[^_]*_|_[^_]*$", "", s)

See the IDEONE demo

Regular Expression Distribution:

Match ...

  • ^[^_]*_- 0 or more characters other than _the beginning of the line, and_
  • | - or...
  • _[^_]*$ - a _ 0 , _,

.

library(stringr)
str_extract(s,"(?<=_)\\d{4}(?=_)")

Perl- 4- , .

+5

stringi, . , 4 . , .

library(stringi)

x <- c("GMM14_2000_NGVA", "GMM14_2001_NGVA")

stri_extract_last(x, regex = "\\d{4}")
#[1] "2000" "2001"

stri_extract_first(x, regex = "\\d{4}")
#[1] "2000" "2001"
+6

base-R strsplit @jazzurro:

x <- c("GMM14_2000_NGVA", "GMM14_2001_NGVA")

vapply(strsplit(x, '_'), function(x) x[2], character(1))
[1] "2000" "2001"

strsplit x _ , x. vapply, , .

+2

sub.

sub(".*_(\\d{4})_.*", "\\1", x)

devtools::install_github("Avinash-Raj/dangas")
library(dangas)
extract_a("_", "_", x)

, . .

:

extract_a(start, end, string)
+2

R, .

.

R regmatches:

regmatches, , . , regexpr gregexpr. regexpr gregexpr. regexpr regmatches , . , . regexpr, regmatches . NULL, .

>x <- c("abc", "def", "cba a", "aa")
> m <- regexpr("a+", x, perl=TRUE)
> regmatches(x, m)
[1]  "a"  "a"  "aa"

:

m <- regexpr("\d{4}", year1, perl=TRUE)
regmatches(year1, m)

If you can add 4 more digits per line on the same line, you can use non-capturing groups . Probably like this:

"(?:_)\d{4}(?:_)"

Sorry, you have no way to check all this in R.

0
source

Source: https://habr.com/ru/post/1609814/


All Articles