Wow, this is a pretty huge topic that you decide :) There are definitely many books and articles that you can read about it, but I will try to give a brief introduction. I am not a great expert, but I have worked on some of these things.
, / ( ) , ( ). , , .
, , . , . , 5 , 5 , 5 .. , , , . , , . ( ) .
, . , , Bayes TF-IDF. , . , - http://arubyguy.com/2011/03/03/bayes-classification-update/, - , .
TF-IDF TermFrequence - InverseDocumentFrequency. , , , , . , D T1, T2, , T3 - , , D .
, . , "" - , "" - . ( ). Java, Lucene, . API " " , . Google "TF-IDF",