I am working on an algorithm that calculates the similarity coefficient between restaurants. Before we can calculate the similarity mentioned, I need 3 users who rated both restaurants. Here is a table with a possible scenario as an example:
| Restaurant 1 | Restaurant 2 User 1 | X | 2 User 2 | 1 | 5 User 3 | 4 | 3 User 4 | 2 | 1 User 5 | X | 5
Here X not subject to review, and ratings are not user reviews for the restaurant. You can see that similarities can be calculated because users 2, 3, and 4 rated both restaurants.
Because I use the adjusted cosine similarity. I need the average value of ratings from each user.
Now I get a list of all restaurants and a double cycle to check if the similarity between the restaurants can be calculated.
I use the following double for loop to check if this is possible:
for (int i = 0; i < allRestaurants.Count; i++) for (int j = 0; j < allRestaurants.Count; j++) if (i < j) matrix.Add(new Similarity() { Id = Guid.NewGuid(), FirstRest = allRestaurants[i], SecondRest = allRestaurants[j], Sim = ComputeSimilarity(allRestaurants[i], allRestaurants[j], allReviews) });
Inside ComputeSimilarity I use the following LINQ statement to check the number of matches:
public double ComputeSimilarity(Guid restaurant1, Guid restaurant2, IEnumerable<Tuple<List<Review>, double>> allReviews) {
Now you can see that this approach is very difficult, and it takes a lot of time when you increase the number of restaurants for the double for cycle.
I decided that the best approach would be to get all the restaurants that already meet this requirement, but I was fixated on how to do this. I think you can get a list of "matches", which will be a list of tuples, which will look like this: Tuple<Review, Review, double> . Where these reviews will be from the same user, and double is the average rating of reviews from the user.
I try several attempts, but I get hung up all the time when I want to add a condition in which I need to search only restaurants with 3 matches.
For reference, my review object is as follows:
public class Review { [DatabaseGenerated(DatabaseGeneratedOption.Identity)] public virtual Guid Id { get; set; } public virtual int Rating { get; set; } public virtual Guid RestaurantId { get; set; } public virtual Guid UserId { get; set; }
And my restaurant object:
public class Restaurant { [DatabaseGenerated(DatabaseGeneratedOption.Identity)] public virtual Guid Id { get; set; }
I am looking for something that is better than my current approach, is there anyone who can point me in the right direction or suggest a better approach? Also, if you need more information, let me know! Thanks in advance!
Edit: The first example shows two restaurants, but the list could be longer, of course. The fact is that I want only restaurants for which you can calculate the similarities.
So take the following example:
| Restaurant 1 | Restaurant 2 | Restaurant 3 User 1 | X | 2 | X User 2 | 1 | 5 | X User 3 | 4 | 3 | 3 User 4 | 2 | 1 | 2 User 5 | X | 5 | X User 6 | X | X | 2
The only possible match is between restaurant 1 and restaurant 2. Since there are not enough matches (in this case at least 3), it is impossible to calculate the similarity. Thus, to optimize this, you need to create a list of restaurants where you can calculate the similarities.
To explain further, match , where 2 users rated both restaurants. Restaurant 3 has 3 reviews, but only 2 of them are matches, as User 6 rated this restaurant only.
So, if we give 3 restaurants above as input, he should create a list of restaurants for which the similarity can be calculated (in this case, only restaurant 1 and 2).
Edit 2: I will add an example of what my desired result should look like:
โMatchโ is a place where at least 3 users have rated the same 2 restaurants. So, let's say we have a restaurant X and Y, the output may look like this:
| Restaurant X | Restaurant Y User 1 | 5 | 3 User 2 | 2 | 5 User 3 | 1 | 2
Now, if we added a third restaurant to the list, which each user also viewed:
| Restaurant X | Restaurant Y | Restaurant Z User 1 | 5 | 3 | 2 User 2 | 2 | 5 | 3 User 3 | 1 | 2 | 1
Now you can see how you can create similarities between each restaurant here. Similarities between X and Y, X and Z, Y and Z.
This can be modeled in a separate class as follows:
public class Match { public Review rev1 { get; set; }
If we have 3 of these matches, each of which has the same RestaurantId from rev1 and the same RestaurantId from rev2.
So, a list of these matches might look like this:
- Match 1:
rev1.RestaurantId = 1 | rev2.RestaurantId = 2 | UserId = 11 This UserId is the same on rev1 and rev2 - Match 2:
rev1.RestaurantId = 1 | rev2.RestaurantId = 2 | UserId = 12 This UserId is the same on rev1 and rev2 - Match 3:
rev1.RestaurantId = 1 | rev2.RestaurantId = 2 | UserId = 13 This UserId is the same on rev1 and rev2
I know that identifiers are guides, but this is purely an example.
I hope this made sense.