I have a puzzleclient data frame and the type of item to which they belong. A client can appear several times in the list if it has several elements.
name type
m1 A
m10 A
m2 A
m9 A
m9 B
m4 B
m5 B
m1 C
m2 C
m3 C
m4 C
m5 C
m6 C
m7 C
m8 C
m1 D
m5 D
I would like to calculate what percentage of people who own "A" also have "B", etc.
Based on the above input, how can I get this output using R:
A B C D TOTAL
A 1 0.25 0.5 0.25 4
B 0.33 1 0.67 0.33 3
C 0.25 0.25 1 0.25 8
D 0.5 0.5 1 1 2
Many thanks for your help!
Here is a long and manual way to do this, without any loops or advanced functions (but, of course, this is the lost potential in R):
Example for element A: -
puzzleA <- subset(puzzle, type == 'A')
Calculation of customers who own A, who also own B: -
length(unique((merge(puzzleA, puzzleB, by = 'name'))$name))/length(unique(puzzleA$name)
Data
puzzle <- structure(list(name = c("m1", "m10", "m2", "m9", "m9", "m4",
"m5", "m1", "m2", "m3", "m4", "m5", "m6", "m7", "m8", "m1", "m5"
), type = c("A", "A", "A", "A", "B", "B", "B", "C", "C", "C",
"C", "C", "C", "C", "C", "D", "D")), .Names = c("name", "type"
), class = "data.frame", row.names = c(NA, -17L))