Pandas - apply a function to the current line against all other lines

I use pandas to create a data frame that looks like this:

ratings = pandas.DataFrame({ 'article_a':[1,1,0,0], 'article_b':[1,0,0,0], 'article_c':[1,0,0,0], 'article_d':[0,0,0,1], 'article_e':[0,0,0,1] },index=['Alice','Bob','Carol','Dave']) 

I would like to compute a different matrix from this input, which will compare each row with all other rows. Suppose, for example, that the calculation was a function to find the length of a set of intersections, I would like to get a DataFrame with len(intersection(Alice,Bob)) , len(intersection(Alice,Carol)) , len(intersection(Alice,Dave)) in the first line, and each line following this format is different. Using this input example, the output matrix will be 4x3:

 len(intersection(Alice,Bob)),len(intersection(Alice,Carol)),len(intersection(Alice,Dave)) len(intersection(Bob,Alice)),len(intersection(Bob,Carol)),len(intersection(Bob,Dave)) len(intersection(Carol,Alice)),len(intersection(Carol,Bob)),len(intersection(Carol,Dave)) len(intersection(Dave,Alice)),len(intersection(Dave,Bob)),len(intersection(Dave,Carol)) 

Is there a named method for computing this type in pandas? What would be the most effective way to do this?

+4
source share
2 answers

I do not know about the named method, but I have a single line.

 In [21]: ratings.apply(lambda row: ratings.apply( ... lambda x: np.equal(row, x), 1).sum(1), 1) Out[21]: Alice Bob Carol Dave Alice 5 3 2 0 Bob 3 5 4 2 Carol 2 4 5 3 Dave 0 2 3 5 
+5
source

@ Dan Allan's solution is "right", here is a slightly different approach to solving the problem

 In [26]: ratings Out[26]: article_a article_b article_c article_d article_e Alice 1 1 1 0 0 Bob 1 0 0 0 0 Carol 0 0 0 0 0 Dave 0 0 0 1 1 In [27]: ratings.apply(lambda x: (ratings.T.sub(x,'index')).sum(),1) Out[27]: Alice Bob Carol Dave Alice 0 -2 -3 -1 Bob 2 0 -1 1 Carol 3 1 0 2 Dave 1 -1 -2 0 
+1
source

Source: https://habr.com/ru/post/1484473/


All Articles