The code is in org.apache.mahout.math.hadoop.similarity.cooccurrence.measures. EuclideanDistanceSimilarity org.apache.mahout.math.hadoop.similarity.cooccurrence.measures. EuclideanDistanceSimilarity .
Yes, it is written in this way, because at this point in the calculation it has the norms of the vectors A and B and their point product, so it calculates the distance much faster in this way.
Identity is pretty simple. Let C = A - B and a, b and c be the lengths of the corresponding vectors. We need c. From the law of cosines c 2 = a 2 + b 2 - 2ab? cos (? theta;) and ab? cos (? theta;) is simply the meaning of the point product. Note that normA in the code is actually the square of the norm (length) - in fact it should be better named.
Let's get back to the question: you are here, the error here is that rounding can make the argument negative. The fix is not abs() , but:
double euclideanDistance = Math.sqrt(Math.max(0.0, normA - 2 * dots + normB));
It just needs to be limited to 0. I can fix this.
source share