A regular expression that returns the numbers following a certain letter until the next letter

I need a regular expression that returns a specific letter and the next (one or two) digits to the next letter. For example, I would like to extract how many carbon atoms (C) are in the formula using regular expressions in R

strings <- c("C16H4ClNO2", "CH8O", "F2Ni")

I need an expression that returns the number C, which can be one or two digits, and which does not return the number after chlorine (Cl).

substr(strings,regexpr("C[0-9]+",strings) + 1, regexpr("[ABDEFGHIJKLMNOPQRSTUVWXYZ]+",strings) -1)
[1] "16" "C"  ""  

but the answer I want to return is

"16","1","0"

In addition, I would like the regular expression to automatically find the next letter and stop in front of it, instead of having the end position, which I indicate as a letter other than C.

+4
4

makeup CHNOSZ . , :

1) L , , "C" 0, :

library(CHNOSZ)

L <- Map(makeup, strings)
sapply(L, function(x) if ("C" %in% names(x)) x[["C"]] else 0)
## C16H4ClNO2       CH8O       F2Ni 
##         16          1          0 

, L - , :

> L
$C16H4ClNO2
 C  H Cl  N  O 
16  4  1  1  2 

$CH8O
C H O 
1 8 1 

$F2Ni
 F Ni 
 2  1 

1a). c(C = 0) , , sapply (1):

sapply(lapply(L, c, c(C = 0)), "[[", "C")

2) (1) , (1), . "C0" , :

sapply(lapply(paste0(strings, "C0"), makeup), "[[", "C")
## [1] 16  1  0

2a) (2), lapply, , makeup :

sapply(makeup(as.matrix(paste0(strings, "C0"))), "[[", "C")
## [1] 16  1  0
+9

, :

  • C + = >
  • C, UPPERCASE ( , ) = > count C

, :

library("stringr")
strings <- c("C16H4ClNO2", "CH8O", "F2Ni")

str1 <- str_extract(strings, '(?<=C)\\d+')
str2 <- str_count(strings, 'C[A-Z]') 
str2[!is.na(str1)] = str1[!is.na(str1)]
str2
# [1] "16" "1"  "0" 

, str1 (C, ), str2 - .

+1

We can do it with base R

sub("C(\\d+).*", "\\1", sub("C([^0-9]+)", 
  "C1\\1", ifelse(!grepl("C", strings), paste0("C0", strings), strings)))
#[1] "16" "1"  "0" 
0
source
ifelse(str_extract(strings,'(?<=C)(\\d+|)')=='',1,str_extract(strings,'(?<=C)(\\d+|)'))
[1] "16" "1"  NA  
0
source

Source: https://habr.com/ru/post/1671820/


All Articles