Replace several places matching one pattern in a row with different replacements

Using the stringr package, you can easily replace regular expressions in vector form.

Question: How can I do the following:

Replace every word in

 hello,world??your,make|[]world,hello,pos 

to various substitutions, for example. increase in number

 1,2??3,4|[]5,6,7 

Please note that simple delimiters cannot be accepted; a practical use case is more complex.


stringr::str_replace_all doesn't seem to work because it

 str_replace_all(x, "(\\w+)", 1:7) 

creates a vector for each substitution, applied to all words, or has indefinite and / or repeating input records so that

 str_replace_all(x, c("hello" = "1", "world" = "2", ...)) 

will not work for this purpose.

+6
source share
3 answers

Here is another idea using gsubfn . The pre function runs before the replacement, and the fun function runs for each lookup:

 library(gsubfn) x <- "hello,world??your,make|[]world,hello,pos" p <- proto(pre = function(t) t$v <- 0, # replace all matches by 0 fun = function(t, x) t$v <- v + 1) # increment 1 gsubfn("\\w+", p, x) 

What gives:

 [1] "1,2??3,4|[]5,6,7" 

This option will give the same answer, since gsubfn supports the count variable for use in proto-functions:

 pp <- proto(fun = function(...) count) gsubfn("\\w+", pp, x) 

See the gsubfn vignette for examples of using count .

+7
source

I suggest an "ore" package for something like this. Of particular note are ore.search and ore.subst , the last of which can take a function as a replacement value.

Examples:

 library(ore) x <- "hello,world??your,make|[]world,hello,pos" ## Match all and replace with the sequence in which they are found ore.subst("(\\w+)", function(i) seq_along(i), x, all = TRUE) # [1] "1,2??3,4|[]5,6,7" ## Create a cool ore object with details about what was extracted ore.search("(\\w+)", x, all = TRUE) # match: hello world your make world hello pos # context: , ?? , |[] , , # number: 1==== 2==== 3=== 4=== 5==== 6==== 7== 
+3
source

Here's the basic R solution. It also needs to be vectorized.

 x="hello,world??your,make|[]world,hello,pos" #split x into single chars x_split=strsplit(x,"")[[1]] #find all char positions and replace them with "a" x_split[gregexpr("\\w", x)[[1]]]="a" #find all runs of "a" rle_res=rle(x_split) #replace run lengths by 1 rle_res$lengths[rle_res$values=="a"]=1 #replace run values by increasing number rle_res$values[rle_res$values=="a"]=1:sum(rle_res$values=="a") #use inverse.rle on the modified rle object and collapse string paste0(inverse.rle(rle_res),collapse="") #[1] "1,2??3,4|[]5,6,7" 
+1
source

Source: https://habr.com/ru/post/986389/


All Articles