What is the meaning of the journal to use to implement the Kullback leibler divergence?

I made a method to implement the Kullback-leibler divergence in java. I used a log with a base value of 2, and I'm not sure if I used it correctly, or should I use a base base value of 10. I use this method to measure the difference between two text units (each of a different length).

My problem is that I do not get the desired measure of discrepancy.

for example, for two text blocks: => "Free ringtones", and the second - "Free ringtones for your mobile phone from PremieRingtones.com"

I should get a difference of 0.25 (for my projects), but I get a difference of 2.0 if I use log base2 and 1.38 for log base10.

Also, I don’t know what value to replace instead of a null value for demnominator.Plz will help to give a clear explanation with some examples, if possible, and even with some links to where I can get the details.

This is my piece of code:

public Double calculateKLD(List<String> values,List<String> value2)   
{  

    Map<String, Integer> map = new HashMap<String, Integer>();  
    Map<String, Integer> map2 = new HashMap<String, Integer>();  
    for (String sequence : values)  
    {  
        if (!map.containsKey(sequence))  
        {  
            map.put(sequence, 0);
        }
        map.put(sequence, map.get(sequence) + 1);
    }

    for (String sequence : value2)  
    {  
        if (!map2.containsKey(sequence)) {
            map2.put(sequence, 0);
        }
        map2.put(sequence, map2.get(sequence) + 1);
    }

    Double result = 0.0;
    Double frequency2=0.0;
    for (String sequence : map.keySet())  
    {

        Double frequency1 = (double) map.get(sequence) / values.size();
        System.out.println("Freuency1 "+frequency1.toString());
        if(map2.containsKey(sequence))
        {

            frequency2 = (double) map2.get(sequence) / value2.size();                
        }
        result += frequency1 * (Math.log(frequency1/frequency2) / Math.log(2));         
    }  
    return result/2.4;  
}    

My input is like this

First text block

   list.add("Free");list.add("Ringtones");  

Second text block

      list2.add("Free");list2.add("Ringtones");list2.add("for");list2.add("your");list2.add("Mobiile");list2.add("Phone");list2.add("from");list2.add("PremieRingtones.com");

Call function

   calculateKLD(list, list2)
+3
source share
1 answer

As you can imagine, you probably want to use the e database (i.e. the natural logarithm). Since the divergence of KL is a statistical measure, the probability is that it is determined in terms of natural logarithms.

+2
source

Source: https://habr.com/ru/post/1788157/


All Articles