How much data is needed for user-based or item-based CFs to give recommendations?

Question

How much data is needed for user-based or item-based CFs to give recommendations?

How much data is required for the user of a CF, CF element to give recommendations?

I manually created a small data set, so I understand how the algorithm works.
I found that for the small dataset that I created, Slope-One can give a recommendation, CF User or Item CF cannot give a recommendation.

What is the reason for this?
What is the data volume threshold?

+2

recommendation-engine

James.Xu Mar 29 '11 at 9:59

source share

3 answers

Movielens, netflix, jester, kddcup dataset are open to everyone. If you are having trouble getting the dataset, check out http://code.google.com/p/recsyscode/wiki/dataset

+1

Alpha Jul 08 '11 at 6:29

source share

For a small dataset, the user CF and CF element may be the same, but for big data, if the number of users is greater than the number of elements (for example, the Netflix dataset and yahoo kddcup2011 dataset), the CF element is much faster than the user CF.
For the result of the Top N recommendation, the accuracy of CF and Item CF are the same, but the coverage is different, User CF recommendations are good for recommending a long tail, while the CF element has a better variety.

+1

Qiang yan Feb 14 '12 at 1:30

source share

miette · Accepted Answer · 2011-04-01T07:54:35+0000

In the user and CF element, the size of the data set can be very small. An important part is the frequency of display between elements and users in the data set. If the user exists in the dataset only once, user cf will most likely not give recommendations. Because one common element will not provide similarity similarity for two users to become neighbors. The above explanation is just an example. For a small data set, such as 1000 data, both recommenders will provide answers to the most similar articles and recommend methods. However, for much smaller datasets, it is useful to manually manage the data, is there enough information about the requested user / element identifier or not. In this link you can find a very small controlled dataset for creating element-based CFs and how it works. I hope this answer is helpful.

How much data is needed for user-based or item-based CFs to give recommendations?

More articles: