I have a data set in which I need to look at all pairs of elements that are together inside another group. Below I gave an example of toys to explain this.
BUNCH FRUITS 1 apples 1 bananas 1 mangos 2 apples 3 bananas 3 apples 4 bananas 4 apples
What I want is a list of all possible pairs and summarize the frequency with which they occur together in conjunction. My conclusion would ideally look like this:
FRUIT1 FRUIT2 FREQUENCY APPLES BANANAS 3 APPLES MANGOS 1
My ultimate goal is to do something that I can eventually import into Gephi for network analysis. For this I need the Source and Target column (also FRUIT1 and FRUIT2 above).
I think there are several more ways to approach this without using PROC SQL (possibly using PROC TRANSPOSE), but this is where I started.
DECISION
Thanks for the help. The sample code below is for anyone interested in something similar:
proc sql; create table fruit_combo as select a.FRUIT as FRUIT1, b.FRUIT as FRUIT2, count(*) as FREQUENCY from FRUITS a, FRUITS b where a.BUNCH=b.BUNCH and and not a.FRUIT= b.FRUIT group by FRUIT1, FRUIT2; quit;
source share