The number of unique rows based on two columns by group

I have data.table in r

col1 col2 col3 col4 1: 5.1 3.5 1.4 setosa 2: 5.1 3.5 1.4 setosa 3: 4.7 3.2 1.3 setosa 4: 4.6 3.1 1.5 setosa 5: 5.0 3.6 1.4 setosa 6: 5.1 3.5 3.4 eer 7: 5.1 3.5 3.4 eer 8: 5.1 3.2 1.3 eer 9: 5.1 3.5 1.5 eer 10: 5.1 3.5 1.4 eer DT <- structure(list(col1 = c(5.1, 5.1, 4.7, 4.6, 5, 5.1, 5.1, 5.1, 5.1, 5.1), col2 = c(3.5, 3.5, 3.2, 3.1, 3.6, 3.5, 3.5, 3.2, 3.5, 3.5), col3 = c(1.4, 1.4, 1.3, 1.5, 1.4, 3.4, 3.4, 1.3, 1.5, 1.4 ), col4 = structure(c(1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L), .Label = c("setosa", "versicolor", "virginica", "eer"), class = "factor")), .Names = c("col1", "col2", "col3", "col4"), row.names = c(NA, -10L), class = c("data.table", "data.frame")) 

I want to count unique (excellent) combinations of col1 and col2 for each col4 value.

Expected Result

  col1 col2 col3 col4 count 1: 5.1 3.5 1.4 setosa 4 2: 5.1 3.5 1.4 setosa 4 3: 4.7 3.2 1.3 setosa 4 4: 4.6 3.1 1.5 setosa 4 5: 5.0 3.6 1.4 setosa 4 6: 5.1 3.5 3.4 eer 2 7: 5.1 3.5 3.4 eer 2 8: 5.1 3.2 1.3 eer 2 9: 5.1 3.5 1.5 eer 2 10: 5.1 3.5 1.4 eer 2 

How to do this in only 1 data.table syntax?

+5
source share
1 answer

At first I had to go through several attempts, and in the end I succeeded. Any good?

 DT[, count:=nrow(unique(.SD)), by=col4, .SDcols=c("col1","col2")] DT col1 col2 col3 col4 count 1: 5.1 3.5 1.4 setosa 4 2: 5.1 3.5 1.4 setosa 4 3: 4.7 3.2 1.3 setosa 4 4: 4.6 3.1 1.5 setosa 4 5: 5.0 3.6 1.4 setosa 4 6: 5.1 3.5 3.4 eer 2 7: 5.1 3.5 3.4 eer 2 8: 5.1 3.2 1.3 eer 2 9: 5.1 3.5 1.5 eer 2 10: 5.1 3.5 1.4 eer 2 > 

the same, but faster thanks to the comment of Procrastinatus below:

 DT[, count:=uniqueN(.SD), by=col4, .SDcols=c("col1","col2")] 
+14
source

Source: https://habr.com/ru/post/1245815/


All Articles