What is the best open source docs package?

Which open source package is best for clustering a large corpus of documents? He must either decide the number of clusters on his own, or he can also take this as a parameter.

We have a large body of documents that do not really revolve around a certain topic - these are documents created by persons selling and managing for various projects and clients in the organization. I know that with such a widespread case, performance will deteriorate, but we try to live with the best we can get. Now, what is the best we can get :-)

+6
source share
1 answer

The list of programs for modeling topics from the site of an expert in this field: http://www.cs.princeton.edu/~blei/topicmodeling.html

Competitive lead group (open source): http://nlp.stanford.edu/software/tmt/tmt-0.3/

Another open source java project: http://mallet.cs.umass.edu/topics.php

+4
source

Source: https://habr.com/ru/post/899209/


All Articles