R delete repeating sequences of numbers

I am trying to delete all digits in a string except the first set of digits. In other words, all repeated sets of numbers, a line can have 1 set or 10+ sets, but I want to save the first set along with the rest of the line.

For example, the following line:

x <- 'foo123bar123baz123456abc1111def123456789' 

Result:

 foo123barbazabcdef 

I try to use gsub and replace \d+ an empty string, but that replaces all the digits in the string, I also tried using groups to capture some results, but no luck.

+6
source share
2 answers

Using gsub, you can use the \G function, an anchor that can match in one of two positions.

 x <- 'foo123bar123baz123456abc1111def123456789' gsub('(?:\\d+|\\G(?<!^)\\D*)\\K\\d*', '', x, perl=T) # [1] "foo123barbazabcdef" 

Explanation

 (?: # group, but do not capture: \d+ # digits (0-9) (1 or more times) | # OR \G(?<!^) # contiguous to a precedent match, not at the start of the string \D* # non-digits (all but 0-9) (0 or more times) )\K # end of grouping and reset the match from the result \d* # digits (0-9) (0 or more times) 

Alternatively, you can use the optional group:

 gsub('(?:^\\D*\\d+)?\\K\\d*', '', x, perl=T) 

Another way that I find useful and does not require (*SKIP)(*F) verbs backtracking or \G and \K is to use alternation in the context of placing what you want to combine in the capture group on the left side, and put something what you want to exclude on the right side (saying throw it away, this is trash ...)

 gsub('^(\\D*\\d+)|\\d+', '\\1', x) 
+7
source

You can do this using the verb PCRE (*SKIP)(*F) .

 ^\D*\d+(*SKIP)(*F)|\d+ 

^\D*\d+ matches all characters from the beginning to the first number. (*SKIP)(*F) causes the match to fail, and then the regex engine tries to match characters using the pattern that was to the right of | which \d+ points to the remaining line. Since (*SKIP)(*F) is a PCRE verb, you need to enable the perl=TRUE parameter.

Demo

code:

 > x <- 'foo123bar123baz123456abc1111def123456789' > gsub("^\\D*\\d+(*SKIP)(*F)|\\d+", "", x, perl=TRUE) [1] "foo123barbazabcdef" 
+3
source

Source: https://habr.com/ru/post/978889/


All Articles