How to normalize similarity metrics from Wordnet

I am trying to calculate the semantic similarity between two words. I am using a Wordnet-based affinity measure. I measure Resnik (RES), measure Lin (LIN), measure Jiang and Conrath (JNC) and Banerjee and Pederson (BNP).

For this, I use nltk and Wordnet 3.0. Then I want to combine the similarity values ​​obtained from different measures. To do this, I need to normalize the similarity values, as some measures give values ​​from 0 to 1, while others give values ​​greater than 1.

So my question is: how to normalize the similarity values ​​obtained from different measures.

More info on what I'm actually trying to do: I have a set of words. I calculate the pairwise similarities between words. and delete words that are not strongly correlated with other words in the set.

+4
source share
1 answer

How to normalize a single measure

Consider one arbitrary measure of similarity M and take an arbitrary word w .

Define m = M(w,w) . Then m takes the maximum possible value of M

We define MN as a normalized measure of M

For any two words w, u you can compute MN(w, u) = M(w, u) / m .

It is easy to see that if M takes non-negative values, then MN takes values ​​in [0, 1] .

How to normalize a measure combined from many measures

To calculate your own specific measure F , consisting of k different measures m_1, m_2, ..., m_k , first normalize each m_i independently using the above method, and then determine:

 alpha_1, alpha_2, ..., alpha_k 

such that alpha_i denotes the weight of the ith measure.

All alpha must sum to 1, i.e.:

 alpha_1 + alpha_2 + ... + alpha_k = 1 

Then, to calculate your own measure for w, u , follow these steps:

 F(w, u) = alpha_1 * m_1(w, u) + alpha_2 * m_2(w, u) + ... + alpha_k * m_k(w, u) 

It is clear that F takes values ​​in [0,1]

+8
source

Source: https://habr.com/ru/post/1494481/


All Articles