The number of unique values ​​per row

I want to count the number of unique values ​​for each row.

For example, with this data frame:

example <- data.frame(var1 = c(2,3,3,2,4,5), var2 = c(2,3,5,4,2,5), var3 = c(3,3,4,3,4,5)) 

I want to add a column that counts the number of unique values ​​for each row; for example, 2 for the first row (since the first row is 2 and 3) and 1 for the second row (since there are only 3 in the second row).

Does anyone know a simple code for this? So far, I have found code to count the number of unique values ​​for each column.

+6
source share
2 answers

This apply function returns a vector of the number of unique values ​​in each row:

 apply(example, 1, function(x)length(unique(x))) 

You can add it to your data.frame in the following two ways (and if you want to name this column as count ):

 example <- cbind(example, count = apply(example, 1, function(x)length(unique(x)))) 

or

 example$count <- apply(example, 1, function(x)length(unique(x))) 
+7
source

We can also use the vector approach with regex . After paste entering the elements of each row of the data set ( do.call(paste0, ... ), match the pattern of any character, capture it as a group ( (.) ), Using a positive lookahead, matching characters only if they appear later in the string ( \\1 - backreference for the captured group and replace it with an empty ( "" ). Thus, in reality only those characters that will be unique remain. Then with nchar we count the number of characters per line.

 example$count <- nchar(gsub("(.)(?=.*?\\1)", "", do.call(paste0, example), perl = TRUE)) example$count #[1] 2 1 3 3 2 1 
-1
source

Source: https://habr.com/ru/post/982785/


All Articles