I am trying to calculate the similarity between the items in the Amazon line "Customers who viewed / purchased X also viewed / bought Y and Z". All the examples and links that I saw relate to the similarity in the computational elements for the ranked elements, to search for similarity between a user or user, or to search for recommended elements based on the history of current users. I would like to start with an inappropriate approach before considering the preferences of current users.
Looking at Amazon.com’s recommendation recommendation , they use the following logic to autonomously resemble product items:
For each item in product catalog, I1
For each customer C who purchased I1
For each item I2 purchased by customer C
Record that a customer purchased I1 and I2
For each item I2
Compute the similarity between I1 and I2
If I understand correctly, by the time we are in "Calculate similiarty between I1 and I2", I have a list of items (I2) purchased in combination with one value of I1 (outer loop).
How is this calculation done?
Another idea is that I overdo it and make it more difficult than I need. Is it enough to make a top-n request to account I2, bought in combination with I1?
I also appreciate the suggestions as to whether this approach is correct. My product database contains about 150 thousand products at any time. Since the bulk of the reading material I've seen shows similarities with a user element or even similarities with a user, should I look for this route instead.
, . , , , 0/1 /. , .
edit: python , db, Oracle PL/SQL.