Maout pre-calculated the similarity of item positions - slow recommendation

I have performance issues with pre-affinity of product items in Mahout.

I have 4 million users with approximately the same number of elements, with custom element settings of 100 MB in size. I want to make a content-based recommendation based on the similarity of the cosines of the vectors of TF-IDF documents. Since this is calculated slowly on the fly, I previously calculated the pair similarity of the 50 most similar documents as follows:

  • I used seq2sparseto create TF-IDF vectors.
  • I used mahout rowIdto create the mahout matrix
  • I used mahout rowSimilarity -i INPUT/matrix -o OUTPUT -r 4587604 --similarityClassname SIMILARITY_COSINE -m 50 -essto create the 50 most similar documents.

I used hadoop to precompute all this. For 4 million elements, the output was only 2.5 GB.

I then uploaded the contents of the files created by the reducers to Collection<GenericItemSimilarity.ItemItemSimilarity> corrMatrix = ..., using docIndexto decode the identifiers of the documents. They were already integers, but rowId decrypted them starting at 1, so I have to return it.

For recommendation, I use the following code:

ItemSimilarity similarity = new GenericItemSimilarity(correlationMatrix);

CandidateItemsStrategy candidateItemsStrategy = new SamplingCandidateItemsStrategy(1, 1, 1, model.getNumUsers(),  model.getNumItems());
MostSimilarItemsCandidateItemsStrategy mostSimilarItemsCandidateItemsStrategy = new SamplingCandidateItemsStrategy(1, 1, 1, model.getNumUsers(),  model.getNumItems());

Recommender recommender = new GenericItemBasedRecommender(model, similarity, candidateItemsStrategy, mostSimilarItemsCandidateItemsStrategy);

I am trying to use a limited data model (1.6M elements), but I loaded all the paired similarities of the elements in memory. I am able to load everything in main memory using 40 GB.

When I want to make a recommendation for one user

Recommender cachingRecommender = new CachingRecommender(recommender);
List<RecommendedItem> recommendations = cachingRecommender.recommend(userID, howMany);

554.938583083 , , , . . CandidateItemsStrategy MostSimilarItemsCandidateItemsStrategy, .

, ? -, , , , . , ? 2,5 40 Collection<GenericItemSimilarity.ItemItemSimilarity> mahout?. , IntWritable, VectorWritable hashMap, ItemItemSimilarity, , ?

.

+1
1

, Collection . -, long startTime = System.nanoTime(); , List<RecommendedItem> recommendations = cachingRecommender.recommend(userID, howMany);. , .

. , ItemSimilarity HashMap<Long, HashMap<Long, Double> . , .

. ItemSimilarity:

public class TextItemSimilarity implements ItemSimilarity{

    private TLongObjectHashMap<TLongDoubleHashMap> correlationMatrix;

    public WikiTextItemSimilarity(TLongObjectHashMap<TLongDoubleHashMap> correlationMatrix){
        this.correlationMatrix = correlationMatrix;
    }

    @Override
    public void refresh(Collection<Refreshable> alreadyRefreshed) {
    }

    @Override
    public double itemSimilarity(long itemID1, long itemID2) throws TasteException {
        TLongDoubleHashMap similarToItemId1 = correlationMatrix.get(itemID1);   
        if(similarToItemId1 != null && !similarToItemId1.isEmpty() &&  similarToItemId1.contains(itemID2)){
            return similarToItemId1.get(itemID2);
        }   
        return 0;
    }
    @Override
    public double[] itemSimilarities(long itemID1, long[] itemID2s) throws TasteException {
        double[] result = new double[itemID2s.length];
        for (int i = 0; i < itemID2s.length; i++) {
            result[i] = itemSimilarity(itemID1, itemID2s[i]);
        }
        return result;
    }
    @Override
    public long[] allSimilarItemIDs(long itemID) throws TasteException {
        return correlationMatrix.get(itemID).keys();
    }
}

Collection<GenericItemSimilarity.ItemItemSimilarity> 30 , TLongObjectHashMap<TLongDoubleHashMap> TextItemSimilarity 17 . 0,05 Collection<GenericItemSimilarity.ItemItemSimilarity> 0,07 TLongObjectHashMap<TLongDoubleHashMap>. , CandidateItemsStrategy MostSimilarItemsCandidateItemsStrategy

, HashMap , , Collection<GenericItemSimilarity.ItemItemSimilarity>.

+1

Source: https://habr.com/ru/post/1584357/


All Articles