How does the statistical calculation of "similar products / music / ..." from shopping / listening to customers work?

Question

How does the statistical calculation of "similar products / music / ..." from shopping / listening to customers work?

I mean product offerings on Amazon or, more specifically, a similar recommendation from the group at Last.fm.

Given that you can maintain the full listener / purchase behavior of your users (WHO listened to WHAT how OFTEN?), How do you calculate which groups are similar to any given ranges, and how many?

I found some sites on Wikipedia ( Teaching association rules , Affinity analysis ), but I would like to get some information from the point of view of a programmer and, preferably, pseudo-code or Python code for it.

Given that I have

dic = { "Alice" : { "AC/DC" : 2, "The Raconteurs" : 3, "Mogwai" : 1 }, "Bob" : { "The XX" : 4, "Lady Gaga" : 3, "Mogwai" : 1, "The Raconteurs" : 1 } "Charlie" : { "AC/DC" : 7, "Lady Gaga" : 7 } }

where numbers are indicators of reproduction, how would I go through this to find the similarity of the bands?

+4

python statistics data-mining similarity

Felix dombek Feb 22 '11 at 17:36

source share

4 answers

nikow · Answer 1 · 2011-02-22T18:09:13+0000

The book " Collective Intelligence Programming: Creating Smart Web 2.0 Applications is a Classic and Uses Python. Among other things, it also deals with recommendation mechanisms.

enter image description here

ars · Answer 2 · 2011-03-16T20:25:57+0000

The Orange helps with startup. Another useful package available with source code is pysuggest , which implements a number of replication / collaborative filtering algorithms.

Rodrigo · Answer 3 · 2011-03-16T20:10:17+0000

I think what you are saying is collaborative filtering. As far as I know, Amazon and others use a Java framework called Apache Mahout, which in a nutshell is a “factory recommender” based on user / item data.

Check for free. However, I'm not sure if it is suitable for Python integration, I'm less than new to this department.

Erich owens · Answer 4 · 2012-06-17T08:31:33+0000

When you have data connecting users and products, you mean a bipartite graph between the two sets. Useful is the adjacency matrix (very rare) of this graph. If you do some work with normalizing the length of the columns, then multiply its transposition by the matrix itself, then in some sense you have similarities between the elements, since they are reflected in the intermediate user base.

How does the statistical calculation of "similar products / music / ..." from shopping / listening to customers work?

More articles: