Why don't we include 0 matches when calculating the jaccard distance between binary numbers?

I am working on a program based on Jaccard Distance, and I need to calculate the Jaccard Distance value between two binary bit vectors. I came across the following on the net:

 If p1 = 10111 and p2 = 10011,

 The total number of each combination attributes for p1 and p2:

 M11 = total number of attributes where p1 & p2 have a value 1,
 M01 = total number of attributes where p1 has a value 0 & p2 has a value 1,
 M10 = total number of attributes where p1 has a value 1 & p2 has a value 0,
 M00 = total number of attributes where p1 & p2 have a value 0.
 Jaccard similarity coefficient = J = 
 intersection/union = M11/(M01 + M10 + M11) 
 = 3 / (0 + 1 + 3) = 3/4,

 Jaccard distance = J' = 1 - J = 1 - 3/4 = 1/4, 
 Or J' = 1 - (M11/(M01 + M10 + M11)) = (M01 + M10)/(M01 + M10 + M11)
 = (0 + 1)/(0 + 1 + 3) = 1/4

Now, when calculating the coefficient, why is “M00” not included in the denominator? Can someone explain?

+1
source share
1 answer

The jacquard index A and B is equal to | A∩B | / | A∪B | = | A∩B | / (| A | + | B | - | A∩B |).

We have: | A∩B | = M11, | A | = M11 + M10, | B | = M11 + M01.

So | A∩B | / (| A | + | B | - | A∩B |) = M11 / (M11 + M10 + M11 + M01 - M11) = M11 / (M10 + M01 + M11).

This venn diagram may help: enter image description here

0

Source: https://habr.com/ru/post/1778992/


All Articles