In my recommendation system, I ran into the “all couples” problem. Thanks to this databricks blog , it looks like RowMatrix can come to the rescue.
However, RowMatrix is a matrix type without significant row indices, so I don’t know how to get the similarity result after calling columnSimilarities(threshold) for specific elements i and j
Below is some information about what I am doing:
1) My data file comes from Movielens with this format:
user::item::rating
2) I create a RowMatrix in which each sparse vector I represents the ratings of all users of this element i
val dataPath = ... val ratings: RDD[Rating] = sc.textFile(dataPath).map(_.split("::") match { case Array(user, item, rate) => Rating(user.toInt, item.toInt, rate.toDouble) }) val rows = ratings.map(rating=>(rating.product, (rating.user, rating.rating))) .groupByKey() .map(p => Vectors.sparse(userAmount, p._2.map(r=>(r._1-1, r._2)).toSeq)) val mat = new RowMatrix(rows) val similarities = mat.columnSimilarities(0.5)
Now I get a similarities coordinate matrix. How can I get the similarity of specific elements i and j? Although it can be used to extract RDD[MatrixEntry] , I'm not sure if the rows i and column j correspond to the elements i and j.
source share