In the Spark: MatrixFactorizationModel.scala program, "recommendationProductsForUsers" takes a very long time to complete

I have 9 cluster nodes, and each node has the following configurations:

enter image description here

enter image description here

I am trying to create recommendations for all users in the MatrixFactorizationModel using the Recommend ProductsForUsers recommend function. It seems like it takes a very long time to complete (for example: about 1 hour of data takes about 34 hours). Is this due to iteration several times over the matrix?

How to reduce lead time?

This is my burning configuration configuration:

spark-submit --jars $ JAR_LOC - class com.collaborativefiltering.CustomerCollaborativeJob --driver-memory 5G --num executors 7 - executive kernels 2 - executor memory 20G --master yarn-client cust_rec / cust-rec.jar - -period 1month --out / PATH -rank 50 - last 2 - lambda 0.25 - alpha 300 - topK 20

Thank you in advance.

+5
source share
1 answer

I found in the MatrixFactorizationModel the recommendation ProductsProUsers goes through multiple iteration, so the computation time is long. As soon as I began to carry out my tasks in the cloud, I tested the work by increasing the number of nodes and spark performers. It really worked! I was able to complete and complete the work within 4 hours.

0
source

Source: https://habr.com/ru/post/1262165/


All Articles