You have textual and word-like meanings, so you should probably use the example of the 20th newsgroup to get inspiration. This is a good example, and you can easily reproduce the code with your csv file.
Here is the working link of the latest mahout version for 20 news:
https://github.com/jpatanooga/MahoutExamples/blob/master/src/main/java/com/cloudera/mahout/classification/sgd/TwentyNewsgroups.java
There is only an adaptation for the countWords method with changes to the TokenSream object, here is the working code with the latest version of Mahout:
private static void countWords(Analyzer analyzer, Collection<String> words, Reader in) throws IOException {
TokenStream ts = analyzer.tokenStream("text", in);
ts.addAttribute(CharTermAttribute.class);
ts.reset();
while (ts.incrementToken()) {
String s = ts.getAttribute(CharTermAttribute.class).toString();
words.add(s);
}
ts.end();
ts.close();
}
Hope this helps you. I used this example to adapt with a CSV file and it worked.
source
share