Document analysis and labeling

Let's say I have many essays (thousands) that I want to tag, classify, etc. Ideally, I would like to teach something by classifying / placing a few hundred manually, and then letting go of it.

What resources (books, blogs, languages) would you recommend for such a task? Part of me thinks that it will be well suited for the Bayesian classifier or even Hidden semantic analysis , but I am not very familiar with the fact that I found gems from several ruby .

Could such a solution be resolved by the Bayesian classifier? Should I look more at semantic analysis / natural language processing? Or should I just look for keyword density and display there?

Any suggestions appreciated (I don't mind collecting some books if necessary)!

+3
source share
2 answers

Wow, this is a pretty huge topic that you decide :) There are definitely many books and articles that you can read about it, but I will try to give a brief introduction. I am not a great expert, but I have worked on some of these things.

, / ( ) , ( ). , , .

, , . , . , 5 , 5 , 5 .. , , , . , , . ( ) .

, . , , Bayes TF-IDF. , . , - http://arubyguy.com/2011/03/03/bayes-classification-update/, - , .

TF-IDF TermFrequence - InverseDocumentFrequency. , , , , . , D T1, T2, , T3 - , , D .

, . , "" - , "" - . ( ). Java, Lucene, . API " " , . Google "TF-IDF",

+5

Source: https://habr.com/ru/post/1794492/


All Articles