Counting the appearance of a specific letter in a vector of words in r

I am trying to count the number of a specific letter in a long word vector.

eg:

I would like to count the number of letters "A" in the next vector.

myvec <- c("A", "KILLS", "PASS", "JUMP", "BANANA", "AALU", "KPAL") 

Thus, the expected output:

 c(1,0,1,0, 3,2,1) 

Any idea?

+6
source share
4 answers

Another possibility:

 myvec <- c("A", "KILLS", "PASS", "JUMP", "BANANA", "AALU", "KPAL") sapply(gregexpr("A", myvec, fixed = TRUE), function(x) sum(x > -1)) ## [1] 1 0 1 0 3 2 1 

EDIT This was begging for a test:

 library(stringr); library(stringi); library(microbenchmark); library(qdapDictionaries) myvec <- toupper(GradyAugmented) GREGEXPR <- function() sapply(gregexpr("A", myvec, fixed = TRUE), function(x) sum(x > -1)) GSUB <- function() nchar(gsub("[^A]", "", myvec)) STRSPLIT <- function() sapply(strsplit(myvec,""), function(x) sum(x=='A')) STRINGR <- function() str_count(myvec, "A") STRINGI <- function() stri_count(myvec, fixed="A") VAPPLY_STRSPLIT <- function() vapply(strsplit(myvec,""), function(x) sum(x=='A'), integer(1)) (op <- microbenchmark( GREGEXPR(), GSUB(), STRINGI(), STRINGR(), STRSPLIT(), VAPPLY_STRSPLIT(), times=50L)) ## Unit: milliseconds ## expr min lq mean median uq max neval ## GREGEXPR() 477.278895 631.009023 688.845407 705.878827 745.73596 906.83006 50 ## GSUB() 197.127403 202.313022 209.485179 205.538073 208.90271 270.19368 50 ## STRINGI() 7.854174 8.354631 8.944488 8.663362 9.32927 11.19397 50 ## STRINGR() 618.161777 679.103777 797.905086 787.554886 906.48192 1115.59032 50 ## STRSPLIT() 244.721701 273.979330 331.281478 294.944321 348.07895 516.47833 50 ## VAPPLY_STRSPLIT() 184.042451 206.049820 253.430502 219.107882 251.80117 595.02417 50 boxplot(op) 

And stringi , shouting some big tail. vapply + strsplit was a good approach, as was a simple gsub approach. Interesting results for sure.

enter image description here

+7
source

For a brief basic R-solution, try the following:

 nchar(gsub("[^A]", "", myvec)) # [1] 1 0 1 0 3 2 1 
+8
source
 library(stringr) str_count(myvec, "A") #[1] 1 0 1 0 3 2 1 

or

  library(stringi) stri_count(myvec, fixed="A") #[1] 1 0 1 0 3 2 1 

or

  vapply(strsplit(myvec,""), function(x) sum(x=='A'), integer(1)) #[1] 1 0 1 0 3 2 1 
+7
source

sapply can also be used:

 > sapply(strsplit(myvec,""), function(x) sum(x=='A')) [1] 1 0 1 0 3 2 1 
0
source

Source: https://habr.com/ru/post/977559/


All Articles