I am studying learning mechanisms, and I looked at a document that defines how Google News generates recommendations to users for news that may interest them, based on collaborative filtering.
One interesting method they mention is Minhashing. I went through what he does, but I'm sure that I have a fuzzy idea, and there is a high probability that I am wrong. Here is what I can do from this: -
- Gather a set of all the news.
- Define a hash function for the user. This hash function returns the index of the first item from the news that this user was viewing in the list of all news.
- Gather, say “n” the number of such values, and represent the user with this list of values.
- Based on the similarity between these lists, we can calculate the similarity between users as the number of common elements. This significantly reduces the number of comparisons.
- Based on these similarity measures, users of groups in different clusters.
This is what I think it can be. In step 2, instead of defining a constant hash function, it may be possible that we change the hash function so that it returns the index of another element. Thus, one hash function can return the index of the first item from the list of users, another hash function can return the index of the second item from the list of users, etc. Thus, the nature of the hash function satisfying the minwise independent permutations condition sounds like a possible approach.
- , ? Google - ? . .
!