taking into account the list of purchase events (customer_id, item)
1-hammer
1-screwdriver
1-nails
2-hammer
2-nails
3-screws
3-screwdriver
4-nails
4-screws
I am trying to create a data structure that reports how many times an item has been purchased with another item. I didn’t buy it at the same time, but I bought it, since I started saving data. the result will look like
{
hammer : {screwdriver : 1, nails : 2},
screwdriver : {hammer : 1, screws : 1, nails : 1},
screws : {screwdriver : 1, nails : 1},
nails : {hammer : 1, screws : 1, screwdriver : 1}
}
indicating that the hammer was bought twice with nails (person 1.3) and with a screwdriver once (person 1), the screws were bought with a screwdriver once (person 3), etc.
my current approach
users = dict, where userid is the key, and the list of items purchased is the value
usersForItem = dict, where itemid is the key, and the list of users who bought the item is the value
userlist = temporary list of users who rated the current item
pseudo:
for each event(customer,item)(sorted by item):
add user to users dict if not exists, and add the items
add item to items dict if not exists, and add the user
----------
for item,user in rows:
users[user]=users.get(user,[])
users[user].append(item)
if item != last_item:
if last_item != None:
usersForItem[last_item]=userlist
userlist=[user]
last_item = item
items.append(item)
else:
userlist.append(user)
usersForItem[last_item]=userlist
, 2 - . , . , userForItem , , , , . , - , ( ), Python.
relatedItems = {}
for key,listOfUsers in usersForItem.iteritems():
relatedItems[key]={}
related=[]
for ux in listOfReaders:
for itemRead in users[ux]:
if itemRead != key:
if itemRead not in related:
related.append(itemRead)
relatedItems[key][itemRead]= relatedItems[key].get(itemRead,0) + 1
calc jaccard/tanimoto similarity between relatedItems[key] and its values
, ? , , .
edit: , , , . .