Testing and training using different data with MAHOUT

Sorry if this is a question with noob, but I'm new to MAHOUT and I need to do some tests with MovieLens datasets. What I would like to know is it possible to prepare an instructor with u1base.csv and check the recommendation with u1test.csv to determine the accuracy and call?

In the examples I found about evaluation, they only summarize the data, but I want to use u1base to train and test u1test.

u1base.csv and u1test.csv have the same format "UserId, Item, Rating".

The java code I have is:

     File userPreferencesFile = new File("u1base.csv");
      File userTeste = new File("u1test.csv");
      RandomUtils.useTestSeed();

      DataModel dataModel = new FileDataModel(userPreferencesFile);
      DataModel testModel = new FileDataModel(userTeste);


      RecommenderIRStatsEvaluator recommenderEvaluator = new GenericRecommenderIRStatsEvaluator();

      RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
          @Override
          public Recommender buildRecommender(DataModel dataModel) throws TasteException {
              UserSimilarity userSimilarity = new PearsonCorrelationSimilarity(dataModel);
              UserNeighborhood userNeighborhood = new NearestNUserNeighborhood(10, userSimilarity, dataModel);

              return new GenericUserBasedRecommender(dataModel, userNeighborhood, userSimilarity);
          }
      };

      IRStatistics statistics = 
              recommenderEvaluator.evaluate(
                      recommenderBuilder, null, dataModel, null, 2, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);
      System.out.format("The recommender precision is %f%n", statistics.getPrecision());
      System.out.format("The recommender recall is %f%n", statistics.getRecall());
  }

any help would be much appreciated

+4
source share
1 answer

GenericRecommenderIRStatsEvaluator ( ) . , . IRStatsEvaluator.

, .. (, 10). .

A = = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

B = = {1,2, 11, 12, 13}

- , . ( ) .. = A B/count (B) = 2 ouf 5, .. 0,4

- , . = A B/count (A) = 2 10, .. 0,2

, ( ). IRStatsEvaluator , datamodel. :

  • .

, . dataSplitter.getRelevantItemsIDs().

//GenericRecommenderIRStatsEvaluator
public IRStatistics evaluate(RecommenderBuilder recommenderBuilder,
                               DataModelBuilder dataModelBuilder,
                               DataModel dataModel,
                               IDRescorer rescorer,
                               int at,
                               double relevanceThreshold,
                               double evaluationPercentage) throws TasteException {
    .......
    FastIDSet relevantItemIDs = dataSplitter.getRelevantItemsIDs(userID, at, theRelevanceThreshold, dataModel);
    .......

}

//CustomizedRecommenderIRStatsEvaluator    
public IRStatistics evaluate(RecommenderBuilder recommenderBuilder,
                               DataModelBuilder dataModelBuilder,
                               DataModel trainDataModel,
                               DataModel testDataModel,
                               IDRescorer rescorer,
                               int at,
                               double relevanceThreshold,
                               double evaluationPercentage) throws TasteException {
    .......
    FastIDSet relevantItemIDs = dataSplitter.getRelevantItemsIDs(userID, at, theRelevanceThreshold, testDataModel);
    .......

}

, . !!!

+1

Source: https://habr.com/ru/post/1523904/


All Articles