Calculate Euclidean distance between two vectors (word bag) in python

Question

Calculate Euclidean distance between two vectors (word bag) in python

I use a dictionary to represent the number of words in an article

For example, {"name" : 2 , "your": 10, "me", 20} to represent that "name" appears twice, "your" appears 10 times, and "I" appears 20 times.

So, is there a good way to calculate the Euclidean distance of these vectors? The difficulty is that these vectors have different lengths, and some vectors contain certain words, and some do not.

I know that I can write a long function for this, just look for a simpler and smarter way. Thanks

Edit: The goal is to get the similarities between the two articles and group them

+4

python math vector

Bear May 23 '13 at 12:00

source share

2 answers

You can also use the cosine similarity between two vectors, as in this link: http://mines.humanoriented.com/classes/2010/fall/csci568/portfolio_exports/sphilip/cos.html

0

G.Ahmed Aug 30 '13 at 20:29

source share

Blubber · Accepted Answer · 2013-05-23T12:05:43+0000

Sort of

 math.sqrt(sum((a[k] - b[k])**2 for k in a.keys()))

Where a and b are dictionaries with the same keys. If you are going to compare these values between different pairs of vectors, then you must make sure that each vector contains exactly the same words, otherwise your distance measure does not mean anything.

You can calculate the distance based only on the intersection:

 math.sqrt(sum((a[k] - b[k])**2 for k in set(a.keys()).intersection(set(b.keys()))))

Another option is to use union and set unknown values to 0

 math.sqrt(sum((a.get(k, 0) - b.get(k, 0))**2 for k in set(a.keys()).union(set(b.keys()))))

But you should carefully think about what it really means that you are counting.

Calculate Euclidean distance between two vectors (word bag) in python

More articles: