Counting the number "0" in this factor

Consider the following factor

x = factor(c("1|1","1|0","1|1","1|1","0|0","1|1","0|1")) 

I would like to count the number of occurrences of the character "0" in this factor. The only solution I have found so far is

 sum(grepl("0",strsplit(paste(sapply(x, as.character), collapse=""), split="")[[1]])) # [1] 4 

This decision seems very difficult for such a simple process. Is there a "better" alternative? (Since the process will be repeated about 100,000 times for factors of 2,000 elements, I can also take care of performance.)

+5
source share
2 answers
 x = factor(c("1|1","1|0","1|1","1|1","0|0","1|1","0|1")) x # [1] 1|1 1|0 1|1 1|1 0|0 1|1 0|1 # Levels: 0|0 0|1 1|0 1|1 sum( unlist( lapply( strsplit(as.character(x), "|"), function( x ) length(grep( '0', x ))) ) ) # [1] 4 

or

 sum(nchar(gsub("[1 |]", '', x ))) # [1] 4 

Based on comments by @Rich Scriven

 sum(nchar(gsub("[^0]", '', x ))) # [1] 4 

Based on @thelatemail comment - using tabulate is much faster than the above solution. Here is a comparison.

 sum(nchar(gsub("[^0]", "", levels(x) )) * tabulate(x)) 

Time Profile:

 x2 <- sample(x,1e7,replace=TRUE) system.time(sum(nchar(gsub("[^0]", '', x2 )))); # user system elapsed # 14.24 0.22 14.65 system.time(sum(nchar(gsub("[^0]", "", levels(x2) )) * tabulate(x2))); # user system elapsed # 0.04 0.00 0.04 system.time(sum(str_count(x2, fixed("0")))) # user system elapsed # 1.02 0.13 1.25 
+7
source

Here are three options.

Option 1: scan() vector using sep="|"

 sum(scan(text=as.character(x), sep="|") == 0) # [1] 4 

Option 2: Fixed character in gregexpr()

 sum(unlist(gregexpr("0", x, fixed=TRUE)) > 0) # [1] 4 

Option 3: Very simple and fast package with stringr parameter

 library(stringr) sum(str_count(x, fixed("0"))) # [1] 4 
+6
source

Source: https://habr.com/ru/post/1264912/


All Articles