Ok, now I have a better solution :)
x <- c("This is the longest sentence in world, so now just make it longer","No in fact, this is the longest sentence in entire world, world, world, world, the whole world") extract <- function(x){ result <- stri_extract_first_regex(x, "^.{0,40}( |$)") longer <- stri_length(x) > 40 result[longer] <- stri_paste(result[longer], "...") result } extract(x)
Tests new against the old (32,000 sentences):
microbenchmark(sapply(x, cutAndAddDots, USE.NAMES = FALSE), extract(x), times=5) Unit: milliseconds expr min lq median uq max neval sapply(x, cutAndAddDots, USE.NAMES = FALSE) 3762.51134 3762.92163 3767.87134 3776.03706 3788.139 5 extract(x) 56.01727 57.18771 58.50321 79.55759 97.924 5
OLD VERSION
This solution requires the stringi package, and ALWAYS adds three dots ... to the end of the line.
require(stringi) sapply(x, function(x) stri_paste(stri_wrap(x, 40)[1],"..."),USE.NAMES = FALSE)
This adds three dots only to sentences longer than 40 characters:
require(stringi) cutAndAddDots <- function(x){ w <- stri_wrap(x, 40) if(length(w) > 1){ stri_paste(w[1],"...") }else{ w[1] } } sapply(x, cutAndAddDots, USE.NAMES = FALSE)
Performance note Setting normalize=FALSE to stri_wrap can speed it up about 3 times (tested on 30,000 sentences)
Test data:
x <- stri_rand_lipsum(3000) x <- unlist(stri_split_regex(x,"(?<=\\.) ")) head(x) [1] "Lorem ipsum dolor sit amet, vel commodo in." [2] "Ultricies mauris sapien lectus dignissim." [3] "Id pellentesque semper turpis habitasse egestas rutrum ligula vulputate laoreet mollis id." [4] "Curabitur volutpat efficitur parturient nibh sociosqu, faucibus tellus, eleifend pretium, quis." [5] "Feugiat vel mollis ultricies ut auctor." [6] "Massa neque auctor lacus ridiculus." stri_length(head(x)) [1] 43 41 90 95 39 35 cutAndAddDots <- function(x){ w <- stri_wrap(x, 40, normalize = FALSE) if(length(w) > 1){ stri_paste(w[1],"...") }else{ w[1] } } cutAndAddDotsNormalize <- function(x){ w <- stri_wrap(x, 40, normalize = TRUE) if(length(w) > 1){ stri_paste(w[1],"...") }else{ w[1] } } require(microbenchmark) microbenchmark(sapply(x, cutAndAddDots, USE.NAMES = FALSE),sapply(x, cutAndAddDotsNormalize, USE.NAMES = FALSE),times=3) Unit: seconds expr min lq median uq max sapply(x, cutAndAddDots, USE.NAMES = FALSE) 3.917858 3.967411 4.016964 4.055571 4.094178 sapply(x, cutAndAddDotsNormalize, USE.NAMES = FALSE) 13.493732 13.651451 13.809170 13.917854 14.026538