Apache Spark ALS Recommendation

I launched the small ALS software program recommended on the Apache Spark website that uses Mllib. When using a dataset with ratings 1-5 (I used the MovieLens dataset) it gives recommendations with projected ratings of more than 5! The highest that I found in my little testing is 7.4. Obviously, I either misunderstand what the code should do, or something went wrong. I researched at Latent Factor Recommendender Systems and gave the impression that the implementation of ALS Spark Mlib is based on this .

Why would he return ratings higher than possible? It makes no sense.

I misunderstood the algorithm or the program is wrong?

+6
source share
1 answer

You are looking at the right paper, but I think you expect the algorithm to do what it is not going to do. It produces a low-level approximation to your input as the product of two matrices, but nothing about the multiplying matrices pinches the output values.

You can clamp or round values. You may not want this because you get additional information about how strong the 5 predicted score is. I believe that it is also theoretically impossible for the algorithm to assume that the maximum possible value is the maximum observed value at the input.

+9
source

Source: https://habr.com/ru/post/983795/


All Articles