Regular expression: replace the nth case

Does anyone know how to find the nth occurrence of a string in an expression and how to replace it with a regular expression?

for example i have the following line

txt <- "aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa" 

and I want to replace the 5th appearance of '-' with '|' and 7th appearance of "-" on "||" as

 [1] aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa 

How to do it?

Thanks Florian

+4
source share
3 answers

(1) sub This can be done in one regex with sub :

 > sub("(^(.*?-){4}.*?)-(.*?-.*?)-", "\\1|\\3||", txt, perl = TRUE) [1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa" 

(2) sub twice or this option, which calls sub twice:

 > txt2 <- sub("(^(.*?-){6}.*?)-", "\\1|", txt, perl = TRUE) > sub("(^(.*?-){4}.*?)-", "\\1||", txt2, perl = TRUE) [1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa" 

(3) sub.fun or this variation that creates a sub.fun function that makes one substitution. it uses fn$ from gsubfn package to replace n-1 , pat and value with sub arguments. First define the specified function and then call it twice.

 library(gsubfn) sub.fun <- function(x, pat, n, value) { fn$sub( "(^(.*?-){`n-1`}.*?)$pat", "\\1$value", x, perl = TRUE) } > sub.fun(sub.fun(txt, "-", 7, "||"), "-", 5, "|") [1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa" 

(We could change the arguments to sub in the body of sub.fun , using paste or sprintf to give a basic R-solution, but at the cost of some extra verbosity.)

This can be reformulated as a replacement function giving this nice sequence:

 "sub.fun<-" <- sub.fun tt <- txt # make a copy so that we preserve the input txt sub.fun(tt, "-", 7) <- "||" sub.fun(tt, "-", 5) <- "|" > tt [1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa" 

(4) gsubfn Using gsubfn from the gsubfn package , we can use a particularly simple regular expression (its just a "-" ), and the code has a fairly straightforward structure. We perform proto replacement. The protocol object containing this method is passed in place of the replacement string. The simplicity of this approach arises from the fact that gsubfn automatically makes the count variable available to such methods:

 library(gsubfn) # gsubfn also pulls in proto p <- proto(fun = function(this, x) { if (count == 5) return("|") if (count == 7) return("||") x }) > gsubfn("-", p, txt) [1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa" 

UPDATE: some fixes.

UPDATE 2: Added approach to replacing function (3).

UPDATE 3: added pat argument to sub.fun .

+6
source

An alternative option is to use the Hadley stringr , which forms the basis for the function I wrote:

 require(stringr) replace.nth <- function(string, pattern, replacement, n) { locations <- str_locate_all(string, pattern) str_sub(string, locations[[1]][n, 1], locations[[1]][n, 2]) <- replacement string } txt <- "aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa" txt.new <- replace.nth(txt, "-", "|", 5) txt.new <- replace.nth(txt.new, "-", "||", 7) txt.new # [1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa-aaa||aaa-aaa" 
+4
source

One way to do this is to use gregexpr to find positions - :

 posns <- gregexpr("-",txt)[[1]] 

And then gluing the appropriate fragments and separators:

 paste0(substr(txt,1,posns[5]-1),"|",substr(txt,posns[5]+1,posns[7]-1),"||",substr(txt,posns[7]+1,nchar(txt))) [1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa" 
+1
source

Source: https://habr.com/ru/post/1483123/


All Articles