How to crop and replace a string

string<-c(" this is a string ") 

Is it possible to trim white spaces on both sides of the string (or only on one side as needed) and replace it with the desired character, for example, in R? The number of white spaces varies on each side of the line and should be preserved when replacing.

 "~~~~~~~this is a string~~" 
+4
source share
3 answers

This seems like an inefficient way to do this, but perhaps you should look in gregexpr and regmatches instead of gsub :

 x <- " this is a string " pattern <- "^ +?\\b|\\b? +$" startstop <- gsub(" ", "~", regmatches(x, gregexpr(pattern, x))[[1]]) text <- paste(regmatches(x, gregexpr(pattern, x), invert=TRUE)[[1]], collapse="") paste0(startstop[1], text, startstop[2]) # [1] "~~~~this is a string~~" 

And, for fun, as a function and a "vectorized" function:

 ## The function replaceEnds <- function(string) { pattern <- "^ +?\\b|\\b? +$" startstop <- gsub(" ", "~", regmatches(string, gregexpr(pattern, string))[[1]]) text <- paste(regmatches(string, gregexpr(pattern, string), invert = TRUE)[[1]], collapse = "") paste0(startstop[1], text, startstop[2]) } ## use Vectorize here if you want to apply over a vector vReplaceEnds <- Vectorize(replaceEnds) 

Some sample data:

 myStrings <- c(" Four at the start, 2 at the end ", " three at the start, one at the end ") vReplaceEnds(myStrings) # Four at the start, 2 at the end three at the start, one at the end # "~~~~Four at the start, 2 at the end~~" "~~~three at the start, one at the end~" 
+6
source

Use gsub :

 gsub(" ", "~", " this is a string ") [1] "~~~~this~is~a~string~~" 

This function uses regular expressions to replace (i.e., sub) all occurrences of the pattern within a string.

In your case, you need to express the template in a special way:

 gsub("(^ *)|( *$)", "~~~", " this is a string ") [1] "~~~this is a string~~~" 

Sample means:

  • (^ *) : find one or more spaces at the start of a line
  • ( *$) : find one or more spaces at the end of the line
  • `| : OR operator

Now you can use this approach to solve your problem of replacing each space with a new character:

 txt <- " this is a string " foo <- function(x, new="~"){ lead <- gsub("(^ *).*", "\\1", x) last <- gsub(".*?( *$)", "\\1", x) mid <- gsub("(^ *)|( *$)", "", x) paste0( gsub(" ", new, lead), mid, gsub(" ", new, last) ) } > foo(" this is a string ") [1] "~~~~this is a string~~" > foo(" And another one ") [1] "~And another one~~~~~~~~" 

See ?gsub or ?regexp more details.

+6
source

Or using a more complex pattern matching and gsub ...

 gsub("\\s(?!\\b)|(?<=\\s)\\s(?=\\b)", "~", " this is a string " , perl = TRUE ) #[1] "~~~~this is a string~~" 

Or with @AnandaMahto data:

 gsub("\\s(?!\\b)|(?<=\\s)\\s(?=\\b)", "~", myStrings , perl = TRUE ) #[1] "~~~~Four at the start, 2 at the end~~" #[2] "~~~three at the start, one at the end~" 

Explanation

It uses a positive and negative look and looks for statements:

  • \\s(?!\\b) - matches a space, \\s , which is not followed by a word boundary, (?!\\b) . This would be in itself for everything except the last space before the first word, that is, in itself, we would get "~~~~ this is a string~~" . Therefore, we need another template ...

  • (?<=\\s)\\s(?=\\b) - match the space, \\s , which is preceded by another space, (?<=\\s) and followed by the word boundary, (?=\\b) .

And this is gsub , so it tries to make the maximum number of matches that it can.

+6
source

Source: https://habr.com/ru/post/1500255/


All Articles