I am writing an agglomeration clustering algorithm in java and am having problems with the delete operation. It always seems to fail when the number of clusters reaches half the original number.
In the example code below clustersthere is Collection<Collection<Integer>>.
while(clusters.size() > K){
Collection<Integer> minclust1 = null;
Collection<Integer> minclust2 = null;
double mindist = Double.POSITIVE_INFINITY;
for(Collection<Integer> cluster1 : clusters){
for(Collection<Integer> cluster2 : clusters){
if( cluster1 != cluster2 && getDistance(cluster1, cluster2) < mindist){
minclust1 = cluster1;
minclust2 = cluster2;
mindist = getDistance(cluster1, cluster2);
}
}
}
minclust1.addAll(minclust2);
clusters.remove(minclust2);
}
After several runs of the loop, clusters.remove(minclust2)it eventually returns false, but I donβt understand why.
I tested this code by first creating 10 clusters, each with a single integer from 1 to 10. Distances are random numbers between 0 and 1. Here is the result after adding several println statements. After the number of clusters, I print out the actual clusters, the merge operation and the result of clusters.remove (minclust2).
Clustering: 10 clusters
[[3], [1], [10], [5], [9], [7], [2], [4], [6], [8]]
[5] <- [6]
true
Clustering: 9 clusters
[[3], [1], [10], [5, 6], [9], [7], [2], [4], [8]]
[7] <- [8]
true
Clustering: 8 clusters
[[3], [1], [10], [5, 6], [9], [7, 8], [2], [4]]
[10] <- [9]
true
Clustering: 7 clusters
[[3], [1], [10, 9], [5, 6], [7, 8], [2], [4]]
[5, 6] <- [4]
true
Clustering: 6 clusters
[[3], [1], [10, 9], [5, 6, 4], [7, 8], [2]]
[3] <- [2]
true
Clustering: 5 clusters
[[3, 2], [1], [10, 9], [5, 6, 4], [7, 8]]
[10, 9] <- [5, 6, 4]
false
Clustering: 5 clusters
[[3, 2], [1], [10, 9, 5, 6, 4], [5, 6, 4], [7, 8]]
[10, 9, 5, 6, 4] <- [5, 6, 4]
false
Clustering: 5 clusters
[[3, 2], [1], [10, 9, 5, 6, 4, 5, 6, 4], [5, 6, 4], [7, 8]]
[10, 9, 5, 6, 4, 5, 6, 4] <- [5, 6, 4]
false
[10, 9, 5, 6, 4, 5, 6, 4,...] .
: , HashSet<Integer> (a HashSet<HashSet<Integer>>).