Sklearn.KMeans: how to avoid memory error or value?

Question

Sklearn.KMeans: how to avoid memory error or value?

I am working on the problem of classifying images, and I am creating a word model. To do this, I extracted the SIFT descriptors of all my images, and I must use the KMeans algorithm to find the centers to use as a word bag.

Here are the data I have:

Number of images: 1,584
SIFT Descriptors (32-element vector): 571685
Number of centers: 15840

So, I launched the KMeans algorithm to calculate my centers:

dico = pickle.load(open('./dico.bin', 'rb')) # np.shape(dico) = (571685, 32)
k = np.size(os.listdir(img_path)) * 10 # = 1584 * 10

kmeans = KMeans(n_clusters=k, n_init=1, verbose=1).fit(dico)

pickle.dump(kmeans, open('./kmeans.bin', 'wb'))
pickle.dump(kmeans.cluster_centers_, open('./dico_reduit.bin', 'wb'))

Using this code, I got a memory error because I do not have enough memory on my laptop (only 2 GB), so I decided to split the center numbers into two numbers and select a random half of the SIFT descriptors. This time I got it Value Error : array is too big.

, ?

+4

python scikit-learn memory k-means

Pierre 13 . '17 10:30

1

Pierre · Accepted Answer · 2017-01-22T12:55:21+0000

@sascha , MiniBatchKMeans, :

dico = pickle.load(open('./dico.bin', 'rb'))

batch_size = np.size(os.listdir(img_path)) * 3
kmeans = MiniBatchKMeans(n_clusters=k, batch_size=batch_size, verbose=1).fit(dico)

pickle.dump(kmeans, open('./minibatchkmeans.bin', 'wb'))

Sklearn.KMeans: how to avoid memory error or value?

More articles: