I am using the str_count function from the stringr library with the escape sequence \w which represents:
any character of the word (letter, number or underscore in the current region: in UTF-8 mode only letters and numbers ASCII are taken into account)
Example:
> str_count("How many words are in this sentence", '\\w+') [1] 7
Of the remaining 9 answers that I was able to check, only two (according to Vincent Zoonekind and Petermensner) worked for all the materials presented here, but they also require stringr .
But only this solution works with all the inputs presented so far, as well as inputs such as "foo+bar+baz~spam+eggs" or "Combien de mots sont dans cette phrase?" ,
Reference point:
library(stringr) questions <- c( "", "x", "xy", "xy!", "xy! z", "foo+bar+baz~spam+eggs", "one, two three 4,,,, 5 6", "How many words are in this sentence", "How many words are in this sentence", "Combien de mots sont dans cette phrase ?", " Day after day, day after day, We stuck, nor breath nor motion; " ) answers <- c(0, 1, 2, 2, 3, 5, 6, 7, 7, 7, 12) score <- function(f) sum(unlist(lapply(questions, f)) == answers) funs <- c( function(s) sapply(gregexpr("\\W+", s), length) + 1, function(s) sapply(gregexpr("[[:alpha:]]+", s), function(x) sum(x > 0)), function(s) vapply(strsplit(s, "\\W+"), length, integer(1)), function(s) length(strsplit(gsub(' {2,}', ' ', s), ' ')[[1]]), function(s) length(str_match_all(s, "\\S+")[[1]]), function(s) str_count(s, "\\S+"), function(s) sapply(gregexpr("\\W+", s), function(x) sum(x > 0)) + 1, function(s) length(unlist(strsplit(s," "))), function(s) sapply(strsplit(s, " "), length), function(s) str_count(s, '\\w+') ) unlist(lapply(funs, score))
Exit:
6 10 10 8 9 9 7 6 6 11