Collaborative cypher filtering with attributes in neo4j

I use neo4j to set up a recommendation system. I have the following setup:

Nodes:

  • Users
  • Films
  • Movie attributes (e.g. genre)

Relations

  • (m:Movie)-[w:WEIGHT {weight: 10}]->(a:Attribute)
  • (u:User)-[r:RATED {rating: 5}]->(m:Movie)

Here is a diagram of how it looks:

enter image description here

Now I'm trying to figure out how to apply a collaborative filtering scheme that works as follows:

  • Checks which attributes the user liked (implicitly like movies)
  • Find similar other users who liked these similar attributes.
  • Recommend the best movies in user that the user has NOT seen, but similar other users have seen.

The condition is obvious that each attribute has a specific weight for each film. For instance. The adventure genre may have a weight of 10 for The Lord of the Rings, but a weight of 5 for the Titanic.

In addition, the system must consider the ratings for each film. For instance. if other user rated the Lord of the Rings 5 , then his attributes of the Lord of the Ranges are scaled 5 , not 10 . user who rated implicit attributes also close to 5 should then get this recommended movie, unlike another user who rated similar attributes above.

I started by just recommending only other films that other users have rated, but I'm not sure how to take into account the RATING and WEIGHT relationship. It also did not work:

 MATCH (user:User)-[:RATED]->(movie1)<-[:RATED]-(ouser:User), (ouser)-[:RATED]->(movie2)<-[:RATED]-(oouser:User) WHERE user.uid = "user4" AND NOT (user)-[:RATED]->(movie2) RETURN oouser 
+5
source share
2 answers

What you are looking for, mathematically speaking, is a simplified Jaccard index between two users. That is, how similar they are, they are based on how much they have in common. I am talking about simplified, because we do not take into account films in which they do not agree. Essentially, and after your order it will be:

1) Get the total weight of each attribute for each user. For instance:

 MATCH (user:User{name:'user1'}) OPTIONAL MATCH (user)-[r:RATED]->(m:Movie)->[w:WEIGHT]->(a:Attribute) WITH user, r.rating * w.weight AS totalWeight, a WITH user, a, sum(totalWeight) AS totalWeight 

We need the last line, because we had a line for each Movie-Attribute combination

2) Then we get users with similar tastes. This is a hazardous performance area, some filtering may be required. But, roughly forcing it, we get users who like every attribute within a 10% error (for example)

 WITH user, a, totalWeight*0.9 AS minimum, totalWeight*1.10 AS maximum MATCH (a)<-[w:WEIGHT]-(m:Movie)<-[r:RATES]-(otherUser:User) WITH user, a, otherUser WHERE w.weight * r.rating > minimum AND w.weight * r.rating < maximum WITH user, otherUser 

So now we have a line (the only one due to the last line) with any other User, which is a match. Here, to be honest, I will need to make sure that if other users with the same genre match are involved .. if they are, an additional filter will be required. But I think this should happen after we do this.

3) Now it's easy:

 MATCH (otherUser)-[r:RATES]->(m:Movie) WHERE NOT (user)-[:RATES]->(m) RETURN m, sum(r.rating) AS totalRating ORDER BY totalRating DESC 

As mentioned earlier, the hard part is 2), but after we know how to get the math, it should be simpler. Oh, and about mathematics, for it to work correctly, the total weight for the film was to add 1 ( normalization ). In any other case, the difference between the total weight for films can lead to an unfair comparison.

I wrote this without proper study (paper, pencil, equations, statistics) and tried the code in a sample dataset. Hope this helps you anyway!

If you need this recommendation without taking into account user ratings or attribute weights, it should be enough to substitute the math in the lines in paragraphs 1) and 2) using only r.rating or w.weight respectively. The RATES and WEIGHTS ratios will continue to be used, so, for example, a movie for adventure movie users will be recommended to their favorite Adventure movie consumers, but not changed by ratings or attribute weights, as we have chosen.

EDIT: Code edited to correct syntax errors discussed in the comments.

+4
source

The answer to your first request:

Checks which attributes the user liked (implicitly like movies)

 MATCH (user:User) OPTIONAL MATCH (user)-[r:RATED]->(m:movie) OPTIONAL MATCH (m)-[r:RATED]->(a:Attribute) WHERE user.uid = "user4" RETURN user, collect ({ a:a.title }) 

This is a subquery where you will find movies rated by the user, then find the attributes of the films and finally return a list of your favorite attributes

you can change the return statement to collect (a) as attributes if you need a solid node

+1
source

Source: https://habr.com/ru/post/1265438/


All Articles