Calculation of the similarity "score" between multiple dictionaries

Question

Calculation of the similarity "score" between multiple dictionaries

I have a reference dictionary "dictA", and I need to compare it (to calculate the similarity between the key and vules) to n number of dictionaries that are generated in place. Each dictionary has the same length. Suppose, for the sake of discussion, that n the number of dictionaries to compare with it is 3: dictB, dictC, dictD.

This is what dictA looks like:

dictA={'1':"U", '2':"D", '3':"D", '4':"U", '5':"U",'6':"U"}

Here's what dictB, dictC, and dictD look like:

dictB={'1':"U", '2':"U", '3':"D", '4':"D", '5':"U",'6':"D"}
dictC={'1':"U", '2':"U", '3':"U", '4':"D", '5':"U",'6':"D"}
dictD={'1':"D", '2':"U", '3':"U", '4':"U", '5':"D",'6':"D"}

I have a solution, but only for the option of two dictionaries:

sharedValue = set(dictA.items()) & set(dictD.items())
dictLength = len(dictA)
scoreOfSimilarity = len(sharedValue)
similarity = scoreOfSimilarity/dictLength

My questions: How can I iterate over the n number of dictionaries with dictA, which is the main dictionary with which I compare others. The goal is to get the meaning of "similarity" for each dictionary that I am going to iterate over the main dictionary.

Thank you for your help.

+4

python python-3.x

lechiffre 11 . '16 22:11

4

Prune · Answer 1 · 2016-10-11T22:29:04+0000

- , , , . , . calculate_similarity , " " .

reference = {'1':"U", '2':"D", '3':"D", '4':"U", '5':"U",'6':"U"}
while True:
    on_the_spot = generate_dictionary()
    if on_the_spot is None:
        break
    calculate_similarity(reference, on_the_spot)

, Python. :

victim_list = [
    {'1':"U", '2':"U", '3':"D", '4':"D", '5':"U",'6':"D"},
    {'1':"U", '2':"U", '3':"U", '4':"D", '5':"U",'6':"D"},
    {'1':"D", '2':"U", '3':"U", '4':"U", '5':"D",'6':"D"}
]
for on_the_spot in victim_list:
    # Proceed as above

Python? , yield, return. , .

Rahul Murmuria · Answer 2 · 2016-10-12T16:53:15+0000

, , . , , .

:

dict_a = {'1': "U", '2': "D", '3': "D", '4': "U", '5': "U", '6': "U"}
dict_b = {'1': "U", '2': "U", '3': "D", '4': "D", '5': "U", '6': "D"}
dict_c = {'1': "U", '2': "U", '3': "U", '4': "D", '5': "U", '6': "D"}
dict_d = {'1': "D", '2': "U", '3': "U", '4': "U", '5': "D", '6': "D"}
other_dicts = [dict_b, dict_c, dict_d]

@gary_fixler similarity1 similarity2, .

def similarity1(a):
    def _(b):
        shared_value = set(a.items()) & set(b.items())
        dict_length = len(a)
        score_of_similarity = len(shared_value)
        return score_of_similarity / dict_length
    return _

def similarity2(c):
    a, b = c
    shared_value = set(a.items()) & set(b.items())
    dict_length = len(a)
    score_of_similarity = len(shared_value)
    return score_of_similarity / dict_length

3 :
(1) @gary_fixler map
(2) dicts
(3) dicts

:

print(list(map(similarity1(dict_a), other_dicts)))
print([similarity2((dict_a, dict_v)) for dict_v in other_dicts])

max_processes = int(multiprocessing.cpu_count()/2-1)
pool = multiprocessing.Pool(processes=max_processes)
print([x for x in pool.map(similarity2, zip(itertools.repeat(dict_a), other_dicts))])

, 3 :

[0.5, 0.3333333333333333, 0.16666666666666666]
[0.5, 0.3333333333333333, 0.16666666666666666]
[0.5, 0.3333333333333333, 0.16666666666666666]

, multiprocessing.cpu_count()/2 ( , ). , , - ( ), multiprocessing.cpu_count()/2-1, -1 - .

, 3 :

print(timeit.timeit("list(map(similarity1(dict_a), other_dicts))",
                    setup="from __main__ import similarity1, dict_a, other_dicts", 
                    number=10000))

print(timeit.timeit("[similarity2((dict_a, dict_v)) for dict_v in other_dicts]",
                    setup="from __main__ import similarity2, dict_a, other_dicts", 
                    number=10000))

print(timeit.timeit("[x for x in pool.map(similarity2, zip(itertools.repeat(dict_a), other_dicts))]",
                    setup="from __main__ import similarity2, dict_a, other_dicts, pool", 
                    number=10000))

:

0.07092539698351175
0.06757041101809591
1.6528456939850003

, . , 2 , - . , . . :

for _ in range(7):
    other_dicts.extend(other_dicts)

384 . :

7.934810006991029
8.184540337068029
7.466550623998046

.

Gary Fixler · Answer 3 · 2016-10-11T22:36:26+0000

, dicts. , , , dict , ( functools.partial), :

def similarity (a):
    def _ (b):
        sharedValue = set(a.items()) & set(b.items())
        dictLength = len(a)
        scoreOfSimilarity = len(sharedValue)
        return scoreOfSimilarity/dictLength
    return _

, lambdas:

similarity = lambda a: lambda b: len(set(a.items()) & set(b.items)) / len(a)

dictA :

otherDicts = [dictB, dictC, dictD]
scores = map(similarity(dictA), otherdicts)

min() ( max() - ), :

winner = min(scores)

: .

lechiffre · Answer 4 · 2016-10-12T13:55:55+0000

Thanks to everyone for participating in the response. Here is the result that does what I need:

def compareTwoDictionaries(self, absolute, reference, listOfDictionaries):
    #look only for absolute fit, yes or no
    if (absolute == True):
        similarity = reference == listOfDictionaries
    else:
        #return items that are the same between two dictionaries
        shared_items = set(reference.items()) & set(listOfDictionaries.items())
        #return the length of the dictionary for further calculation of %
        dictLength = len(reference)
        #return the length of shared_items for further calculation of %
        scoreOfSimilarity = len(shared_items)
        #return final score: similarity
        similarity = scoreOfSimilarity/dictLength
    return similarity

Here is the function call

for dict in victim_list:
                output = oandaConnectorCalls.compareTwoDictionaries(False, reference, dict)

"Reference" dict and dict_list "dict" are used as described above.

Calculation of the similarity "score" between multiple dictionaries

More articles: