R: how to find the first digit in a string

string = "ABC3JFD456" 

Suppose I have the above string, and I want to find what the first digit in the string is and save its value. In this case, I would like to keep the value 3 (since this is the first digit in the string). grepl("\\d", string) returns only a boolean, but doesn't tell me anything about where and what the first digit means. What regular expression should be used to determine the value of the first digit?

+6
source share
6 answers

Base R

 regmatches(string, regexpr("\\d", string)) ## [1] "3" 

Or using stringi

 library(stringi) stri_extract_first(string, regex = "\\d") ## [1] "3" 

Or using stringr

 library(stringr) str_extract(string, "\\d") ## [1] "3" 
+11
source

1) sub Try sub with the specified regular expression, which takes the shortest string to a digit, a digit, and then the next and replaces it with a digit:

 sub(".*?(\\d).*", "\\1", string) 

giving:

 [1] "3" 

This also works if string is a vector of strings.

2) strapplyc You can also use strapplyc from gsubfn , in which case you could use an even simpler regular expression:

 strapplyc(string, "\\d", simplify = TRUE)[1] 

gives the same or uses this, which gives the same answer again, but also works if string is a string vector:

 sapply(strapplyc(string, "\\d"), "[[", 1) 
+6
source

Get the location of the numbers

 tmp <- gregexpr("[0-9]", string) iloc <- unlist(tmp)[1] 

Extract the first digit

 as.numeric(substr(string,iloc,iloc)) 

Using regexpr is easier

 tmp<-regexpr("[0-9]",string) if(tmp[[1]]>=0) { iloc <- tmp[1] num <- as.numeric(substr(string,iloc,iloc)) } 
+2
source

Using rex can make this type of task a little easier.

 string = c("ABC3JFD456", "ARST4DS324") re_matches(string, rex( capture(name = "first_number", digit) ) ) #> first_number #> 1 3 #> 2 4 
+1
source
 > which( sapply( strsplit(string, ""), grepl, patt="[[:digit:]]"))[1] [1] 4 

Or

 > gregexpr("[[:digit:]]", string)[[1]][1] [1] 4 

So:

 > splstr[[1]][ which( sapply( splstr, grepl, patt="[[:digit:]]"))[1] ] [1] "3" 

Note that the full result of calling gregexpr is a list, so you need to extract its first element using "[[":

 > gregexpr("[[:digit:]]", string) [[1]] [1] 4 8 9 10 attr(,"match.length") [1] 1 1 1 1 attr(,"useBytes") [1] TRUE 
0
source

A gsub solution based on replacing the substrings preceding and following the first digit with an empty string:

 gsub("^\\D*(?=\\d)|(?<=\\d).*", "", string, perl = TRUE) # [1] "3" 
0
source

Source: https://habr.com/ru/post/978964/


All Articles