How to delete words of a certain length in a string in R?

I want to delete words less than 3 per line. for example my input

str<- c("hello RP have a nice day")

I want my conclusion to be

str<- c("hello have nice day")

Please, help

+4
source share
4 answers

Try the following:

gsub('\\b\\w{1,2}\\b','',str)
[1] "hello  have  nice day"

EDIT \ b - word boundary. If you need to discard excess space, change it as:

gsub('\\b\\w{1,2}\\s','',str)

or

gsub('(?<=\\s)(\\w{1,2}\\s)','',str,perl=T)
+5
source

Or use str_extract_allto extract all words with a length> = 3 andpaste

library(stringr)
paste(str_extract_all(str, '\\w{3,}')[[1]], collapse=' ')
#[1] "hello have nice day"
+3
source
x <- "hello RP have a nice day"
z <- unlist(strsplit(x, split=" "))
paste(z[nchar(z)>=3], collapse=" ")
# [1] "hello have nice day"
+2

, rm_nchar_words qdapRegex, @hwnd (SO regex guuru extraordinaire). 1-2 , 1-3 :

str<- c("hello RP have a nice day")

library(qdapTools)

rm_nchar_words(str, "1,2")
## [1] "hello have nice day"

rm_nchar_words(str, "1,3")
## [1] "hello have nice"

As qdapRegex , the following regular expression should be studied here, where the function Splaces 1,2the quantifier in braces:

S("@rm_nchar_words", "1,2")
##  "(?<![\\w'])(?:'?\\w'?){1,2}(?![\\w'])"
+1
source

Source: https://habr.com/ru/post/1612344/


All Articles