I use pandas to create a data frame that looks like this:
ratings = pandas.DataFrame({ 'article_a':[1,1,0,0], 'article_b':[1,0,0,0], 'article_c':[1,0,0,0], 'article_d':[0,0,0,1], 'article_e':[0,0,0,1] },index=['Alice','Bob','Carol','Dave'])
I would like to compute a different matrix from this input, which will compare each row with all other rows. Suppose, for example, that the calculation was a function to find the length of a set of intersections, I would like to get a DataFrame with len(intersection(Alice,Bob)) , len(intersection(Alice,Carol)) , len(intersection(Alice,Dave)) in the first line, and each line following this format is different. Using this input example, the output matrix will be 4x3:
len(intersection(Alice,Bob)),len(intersection(Alice,Carol)),len(intersection(Alice,Dave)) len(intersection(Bob,Alice)),len(intersection(Bob,Carol)),len(intersection(Bob,Dave)) len(intersection(Carol,Alice)),len(intersection(Carol,Bob)),len(intersection(Carol,Dave)) len(intersection(Dave,Alice)),len(intersection(Dave,Bob)),len(intersection(Dave,Carol))
Is there a named method for computing this type in pandas? What would be the most effective way to do this?