Which classification algorithm to choose?

I would like to classify text documents into four categories. I also have many samples that are already classified, which can be used for training. I would like the algorithm to learn on the fly. Please suggest an optimal algorithm that works for this requirement.

+4
source share
4 answers

If on the fly you mean online learning (where learning and classification can alternate), I suggest the k-nearest neighbor algorithm. It is available in Weka and in the TiMBL package.

Perceptron can also do this.

"Optimal" is not a well-defined term in this context.

+4
source

There are several algorithms that can be studied on the fly. Examples: k-nearest neighbors, naive Bayesian neural networks. You can try how each of these methods on the sample fits.

+3
source

Since you have unlabeled data, you can use a model where this helps. The first thing that comes to my mind is the non-linear NCA: Studying non-linear investing by preserving the class neighborhood structure (Salakhutdinov, Hinton) .

+1
source

Well ... I have to say that the classification of documents is different, what you guys think about.

As a rule, when classifying documents after preprocessing, the test data is always extremely large, for example, O (N ^ 2) ... Therefore, it can be too expensive computational.

Another typical classifier that comes to my mind is the discriminant classifier ... which does not need a generative model for your dataset. After training, you need to do to put your only record in the algorithm, and it will be classified.

Good luck with that. For example, you can check the book of E. Alpadin "Introduction to machine learning."

0
source

Source: https://habr.com/ru/post/1339664/


All Articles