Like Akrun's
transform(data, result=grepl("[VG]{5,}", substr(string, 1, 20)))
Gives out
class string result 1 a ASADSASAVVVVGVGGGSDASSSDDDFGDFGHFGHFGGGGGDDFFDDFGDFGTYJ TRUE 2 b AWEERTGVTHRGEFGDFSDFSGGGGGGDAWSDFAASDADAADWERWEQWD FALSE 3 C GRTVVGGVVVGGSWERGERVGEGDDFASDGGVQWEQWEQWERERYRYER TRUE
Here we use grep in combination with a character class that matches either "G" or "V" ( [VG] ) repeated 5 or more times ( {5, } ). transform simply creates a new data frame with added or modified columns.
EDIT: some breakpoints versus Matthew's creative answer:
set.seed(1) string <- vapply( replicate(1e5, sample(c("V", "G", "A", "S"), sample(20:300, 1), rep=T)), paste0, character(1L), collapse="" ) library(microbenchmark) microbenchmark( grepl("[VG]{5,}", substr(string, 1, 20)), grepl("^.{,15}[VG]{5,}", string), times=10 )
It produces:
Unit: milliseconds expr min lq mean grepl("[VG]{5,}", substr(string, 1, 20)) 131.6668 131.8343 133.6644 grepl("^.{,15}[VG]{5,}", string) 299.7326 300.4416 302.5065
Not quite sure what to expect, but I think it makes sense since substr very easy to use. Times are very close if the pattern has 5 repetitions near the front of the line.