Apache Spark ALS Recommendation

Question

Apache Spark ALS Recommendation

I launched the small ALS software program recommended on the Apache Spark website that uses Mllib. When using a dataset with ratings 1-5 (I used the MovieLens dataset) it gives recommendations with projected ratings of more than 5! The highest that I found in my little testing is 7.4. Obviously, I either misunderstand what the code should do, or something went wrong. I researched at Latent Factor Recommendender Systems and gave the impression that the implementation of ALS Spark Mlib is based on this .

Why would he return ratings higher than possible? It makes no sense.

I misunderstood the algorithm or the program is wrong?

+6

machine-learning collaborative-filtering apache-spark apache-spark-mllib

monster Mar 14 '15 at 16:48

source share

1 answer

Sean owen · Accepted Answer · 2015-03-14T22:04:07+0000

You are looking at the right paper, but I think you expect the algorithm to do what it is not going to do. It produces a low-level approximation to your input as the product of two matrices, but nothing about the multiplying matrices pinches the output values.

You can clamp or round values. You may not want this because you get additional information about how strong the 5 predicted score is. I believe that it is also theoretically impossible for the algorithm to assume that the maximum possible value is the maximum observed value at the input.

Apache Spark ALS Recommendation

More articles: