Which open source package is best for clustering a large corpus of documents? He must either decide the number of clusters on his own, or he can also take this as a parameter.
We have a large body of documents that do not really revolve around a certain topic - these are documents created by persons selling and managing for various projects and clients in the organization. I know that with such a widespread case, performance will deteriorate, but we try to live with the best we can get. Now, what is the best we can get :-)
source share