I work with data containing 10,000 people. The data contains 8 binary (0, 1) variables. Each variable is an indicator if there is a poll module == 1 or not == 0. In general, 2 ^ 8 = 256 possible combinations of 0 and 1 for each variable and each person is possible.
Purpose: I want to group people with the same lines (this means individuals who participated in the same modules).
My data looks like this with only three variables:
dat <- data.frame(id = 1:8,
v1 = rep(0:1, 4),
v2 = rep(1:0, 4),
v3 = rep(1:1, 4))
unique(dat[ , -1])
library(plyr)
ddply(dat[ , -1], .(v1, v2, v3), nrow)
dat$v4 <- rep(c("group1", "group2"), 4)
source
share