Here is one way:
regmatches(tt, regexpr("[0-9].*[0-9]", tt))
I assume there are no other numbers in the file names. So, we just look for the beginning of the number and use the greedy operator .* , So that's it until the last number is fixed. This is done using regexpr , which will get a match position. Then we use regmatches to extract (sub) rows from these matched positions.
where tt :
tt <- c("Species Count (2011-12-15-07-09-39).xls", "Species Count 0511.xls", "Species Count 151112.xls", "Species Count1011.xls", "Species Count2012-01.xls", "Species Count201207.xls", "Species Count2013-01-15.xls")
Benchmarking:
Note. Test results may vary between Windows machines and * nix (as @Hansi notes below under comments).
Pretty good answers there. So, this is the time for benchmarking :)
tt <- rep(tt, 1e5) # tt is from above require(microbenchmark) require(stringr) aa <- function() regmatches(tt, regexpr("[0-9].*[0-9]", tt)) bb <- function() gsub("[Az \\.\\(\\)]", "", tt) cc <- function() str_extract(tt,'([0-9]|[0-9][-])+') microbenchmark(arun <- aa(), agstudy <- cc(), Jean <- bb(), times=25) Unit: seconds expr min lq median uq max neval arun <- aa() 1.951362 2.064055 2.198644 2.397724 3.236296 25 agstudy <- cc() 2.489993 2.685285 2.991796 3.198133 3.762166 25 Jean <- bb() 7.824638 8.026595 9.145490 9.788539 10.926665 25 identical(arun, agstudy) # TRUE identical(arun, Jean) # TRUE
source share