Delete lines containing a colon in R

Question

Delete lines containing a colon in R

This is an exemplary excerpt from my dataset. It looks like this:

Description;ID;Date
wa119:d Here comes the first row;id_112;2018/03/02
ax21:3 Here comes the second row;id_115;2018/03/02
bC230:13 Here comes the third row;id_234;2018/03/02

I want to remove those words that contain a colon. In this case, it will be wa119: d, ax21: 3 and bC230: 13, so my new dataset should look like this:

Description;ID;Date
Here comes the first row;id_112;2018/03/02
Here comes the second row;id_115;2018/03/02
Here comes the third row;id_234;2018/03/02

Unfortunately, I could not find a regex / solution with gsub? Can anyone help?

+4

string replace r

Ferit Mar 03 '18 at 20:46

source share

3 answers

Suppose the column you want to change is dat:

dat <- c("wa119:d Here comes the first row",
         "ax21:3 Here comes the second row",
         "bC230:13 Here comes the third row")

, , , , , , , :

dat_colon_words_removed <- unlist(lapply(dat, function(string){
  words <- strsplit(string, split=" ")[[1]]
  words <- words[!grepl(":", words)]
  paste(words, collapse=" ")
}))

0

lefft 03 . '18 21:00

Another solution that will exactly match the expected OP result may be as follows:

#data
df <- read.table(text = "Description;ID;Date
wa119:d Here comes the first row;id_112;2018/03/02
ax21:3 Here comes the second row;id_115;2018/03/02
bC230:13 Here comes the third row;id:234;2018/03/02", stringsAsFactors = FALSE, sep="\n")

gsub("[a-zA-Z0-9]+:[a-zA-Z0-9]+\\s", "", df$V1)

#[1] "Description;ID;Date"                        
#[2] "Here comes the first row;id_112;2018/03/02" 
#[3] "Here comes the second row;id_115;2018/03/02"
#[4] "Here comes the third row;id:234;2018/03/02"

0

MKR Mar 03 '18 at 21:20

source share

Tyler rinker · Accepted Answer · 2018-03-03T20:54:12+0000

Here is one approach:

## reading in yor data
dat <- read.table(text ='
Description;ID;Date
wa119:d Here comes the first row;id_112;2018/03/02
ax21:3 Here comes the second row;id_115;2018/03/02
bC230:13 Here comes the third row;id:234;2018/03/02
', sep = ';', header = TRUE, stringsAsFactors = FALSE)

## \\w+ = one or more word characters
gsub('\\w+:\\w+\\s+', '', dat$Description)

## [1] "Here comes the first row"  
## [2] "Here comes the second row"
## [3] "Here comes the third row"

Additional information about the \\wabbreviated character class, which is the same as [A-Za-z0-9_]: https://www.regular-expressions.info/shorthand.html

Delete lines containing a colon in R

More articles: