Basic Pseudocode for using SVD with Movielens / Netflix dataset

Question

Basic Pseudocode for using SVD with Movielens / Netflix dataset

I'm struggling to figure out exactly how to start using SVDs with a dataset like MovieLens / Netflix for rating forecasts. I would really appreciate any simple patterns in python / java or the base pseudocode of the process involved. There are a number of articles / articles that summarize the general concept, but I'm not sure how to start its implementation, even using a number of libraries offered.

As far as I understand, I need to convert my initial dataset as follows:

Initial Dataset:

user movie rating 1 43 3 1 57 2 2 219 4

Turn required:

 user 1 2 movie 43 3 0 57 2 0 219 0 4

At this point, I just need to inject this matrix into the SVD algorithm, as provided by the available libraries, and then (somehow) extract the results or do more work on my part?

Some information I read:

http://www.netflixprize.com/community/viewtopic.php?id=1043
http://sifter.org/~simon/journal/20061211.html
http://www.slideshare.net/NYCPredictiveAnalytics/building-a-recommendation-engine-an-example-of-a-product-recommendation-engine
http://www.slideshare.net/bmabey/svd-and-the-netflix-dataset-presentation
.. and a number of other works

Some libraries:
LingPipe (java)
Jama (java)
Pyrsvd (python)

Any advice at all would be appreciated, especially in the underlying dataset. Thank you so much, Oli

+4

recommendation-engine collaborative-filtering svd prediction netflix

oli Mar 14 '11 at 1:19

source share

2 answers

Dataset: http://www.grouplens.org/node/73

SVD: Why not just do it in SAGE if you don't understand how to do SVD? Wolfram alpha or http://www.bluebit.gr/matrix-calculator/ will decompose the matrix for you or on Wikipedia.

+2

isomorphismes Mar 14 '11 at 4:14

source share

Sean owen · Accepted Answer · 2011-03-15T09:31:56+0000

See SVDRecommender in Apache Mahout. Your question about the input format depends entirely on which library or code you use. There is not one standard. At some level, yes, the code will build some kind of matrix inside. For Mahout, the input for all recommenders when delivered as a file is a CSV file with lines such as userID,itemID,rating .

Basic Pseudocode for using SVD with Movielens / Netflix dataset

More articles: