Document analysis and labeling

Question

Document analysis and labeling

Let's say I have many essays (thousands) that I want to tag, classify, etc. Ideally, I would like to teach something by classifying / placing a few hundred manually, and then letting go of it.

What resources (books, blogs, languages) would you recommend for such a task? Part of me thinks that it will be well suited for the Bayesian classifier or even Hidden semantic analysis , but I am not very familiar with the fact that I found gems from several ruby .

Could such a solution be resolved by the Bayesian classifier? Should I look more at semantic analysis / natural language processing? Or should I just look for keyword density and display there?

Any suggestions appreciated (I don't mind collecting some books if necessary)!

+3

nlp classification tagging bayesian

jerhinesmith Feb 24 '11 at 16:20

source share

2 answers

- ( ), - . , , Google . , PHP , Java .

http://en.wikipedia.org/wiki/Vector_space_model

http://www.la2600.org/talks/files/20040102/Vector_Space_Search_Engine_Theory.pdf

+1

brainwash 24 . '11 16:26

Gregory Mostizky · Accepted Answer · 2011-03-04T08:56:46+0000

Wow, this is a pretty huge topic that you decide :) There are definitely many books and articles that you can read about it, but I will try to give a brief introduction. I am not a great expert, but I have worked on some of these things.

, / ( ) , ( ). , , .

, , . , . , 5 , 5 , 5 .. , , , . , , . ( ) .

, . , , Bayes TF-IDF. , . , - http://arubyguy.com/2011/03/03/bayes-classification-update/, - , .

TF-IDF TermFrequence - InverseDocumentFrequency. , , , , . , D T1, T2, , T3 - , , D .

, . , "" - , "" - . ( ). Java, Lucene, . API " " , . Google "TF-IDF",

Document analysis and labeling

More articles: