Delete a single character in a string

We are looking for a regular expression that will remove single characters from the string, with several conditions. One regular expression deletes all single characters in a string, and another regular expression deletes only single characters between the first and last characters. See examples below.

Remove all single characters:

Before

names <- c("John C. Smith", "Chris T. Anderson", "Mary H. Jane",
           "J. J. Smith", "J. Thomas")

After:

"John Smith", "Chris Anderson", "Mary Jane", "Smith", "Thomas"

Deletes single characters, excluding the first and last characters

Before

names <- c("John C. Smith", "Chris T. Anderson", "Mary H. Jane",
           "J. J. Smith", "J. Thomas")

After:

"John Smith", "Chris Anderson", "Mary Jane", "J. J. Smith", "J. Thomas"
+4
source share
2 answers

Edited because I missed part of the question

gsub can remove the template from your data. Here we delete single characters with multiple character strings both before and after.

gsub("(\\w\\w)\\W+\\w\\W+(\\w\\w)", "\\1 \\2", names)
[1] "John Smith"     "Chris Anderson" "Mary Jane"   "J. J. Smith" "J. Thomas"

To get rid of all of them.

gsub("\\W*\\b\\w\\b\\W*", " ", names)
[1] "John Smith"     "Chris Anderson" "Mary Jane"      "  Smith"        " Thomas" 
+6
source

gsub("\\b[A-Z][[:punct:]]\\s*", "", names)
#[1] "John Smith"     "Chris Anderson" "Mary Jane"      "Smith"         
#[5] "Thomas"        

sub("(\\w+)\\s+([A-Z][[:punct:]]\\s*){1,}", "\\1 ", names)
#[1] "John Smith"     "Chris Anderson" "Mary Jane"      "J. J. Smith"   
#[5] "J. Thomas"     
+1

Source: https://habr.com/ru/post/1665566/


All Articles