I have a bunch of sentences and I want to group them using scikit-learn spectral clustering. I run the code and get the results without any problems. But every time I run it, I get different results. I know this is a problem with initiation, but I donβt know how to fix it. This is my part of my code that works with sentences:
vectorizer = TfidfVectorizer(norm='l2',sublinear_tf=True,tokenizer=tokenize,stop_words='english',charset_error="ignore",ngram_range=(1, 5),min_df=1) X = vectorizer.fit_transform(data) # connectivity matrix for structured Ward connectivity = kneighbors_graph(X, n_neighbors=5) # make connectivity symmetric connectivity = 0.5 * (connectivity + connectivity.T) distances = euclidean_distances(X) spectral = cluster.SpectralClustering(n_clusters=number_of_k,eigen_solver='arpack',affinity="nearest_neighbors",assign_labels="discretize") spectral.fit(X)
Data is a list of offers. Every time the code runs, my clustering results are different. How can I get consistent results using spectral clustering. I also have the same problem with Kmean. This is my code for Kmean:
vectorizer = TfidfVectorizer(sublinear_tf=True,stop_words='english',charset_error="ignore") X_data = vectorizer.fit_transform(data) km = KMeans(n_clusters=number_of_k, init='k-means++', max_iter=100, n_init=1,verbose=0) km.fit(X_data)
I appreciate your help.
source share