Deep check on two python dictionaries and get the difference in report form

Question

Deep check on two python dictionaries and get the difference in report form

Let's say there are two dictionaries in python -

Dict1

mydict1 = { "Person" : { "FName" : "Rakesh", "LName" : "Roshan", "Gender" : "Male", "Status" : "Married", "Age" : "60", "Children" : [ { "Fname" : "Hrithik", "Lname" : "Roshan", "Gender" : "Male", "Status" : "Married", "Children" : ["Akram", "Kamal"], }, { "Fname" : "Pinky", "Lname" : "Roshan", "Gender" : "Female", "Status" : "Married", "Children" : ["Suzan", "Tina", "Parveen"] } ], "Movies" : { "The Last Day" : { "Year" : 1990, "Director" : "Mr. Kapoor" }, "Monster" : { "Year" : 1991, "Director" : "Mr. Khanna" } } } }

Dict2

 mydict2 = { "Person" : { "FName" : "Rakesh", "LName" : "Roshan", "Gender" : "Male", "Status" : "Married", "Children" : [ { "Fname" : "Hrithik", "Lname" : "Losan", "Gender" : "Male", "Status" : "Married", "Children" : ["Akram", "Ajamal"], }, { "Fname" : "Pinky", "Lname" : "Roshan", "Gender" : "Female", "Status" : "Married", "Children" : ["Suzan", "Tina"] } ] } }

I want to compare two dictionaries and print the difference in the report format as shown below -

 MISMATCH 1 ========== MATCH DICT KEY : Person >> Children >> LName EXPECTED : Roshan ACUTAL : Losan MISMATCH 2 ========== MATCH LIST ITEM : Person >> Children >> Children EXPECTED : Kamal ACTUAL : Ajamal MISMATCH 3 ========== MATCH LIST ITEM : Person >> Children >> Children EXPECTED : Parveen ACTUAL : NOT_FOUND MISMATCH 4 ========== MATCH DICT KEY : Person >> Age EXPECTED : 60 ACTUAL : NOT_FOUND MISMATCH 5 ========== MATCH DICT KEY : Person >> Movies EXPECTED : { Movies : {<COMPLETE DICT>} } ACTUAL : NOT_FOUND

I tried using a Python module called datadiff which does not give me nice output in dictionary format. To generate a report, I need to go through the dictionary and find the keys "+" - ". If the dictionary is too complex, then it is difficult to go through.

+4

python

Ronu Jul 12 '13 at 15:03

source share

1 answer

mr2ert · Answer 1 · 2013-07-12T17:21:32+0000

UPDATE: I updated the code for more convenient viewing of lists. I also commented on the code to make it more understandable if you need to change it.

This answer is not 100% general right now, but it can be easily expanded to fit what you need.

 def print_error(exp, act, path=[]): if path != []: print 'MATCH LIST ITEM: %s' % '>>'.join(path) print 'EXPECTED: %s' % str(exp) print 'ACTUAL: %s' % str(act) print '' def copy_append(lst, item): foo = lst[:] foo.append(str(item)) return foo def deep_check(comp, compto, path=[], print_errors=True): # Total number of errors found, is needed for when # testing the similarity of dicts errors = 0 if isinstance(comp, list): # If the types are not the same then it is probably a critical error # return a number to represent how important this is if not isinstance(compto, list): if print_errors: print_error(comp, 'NOT_LIST', path) return 1 # We don't want to destroy the original lists comp_copy = comp[:] compto_copy = compto[:] # Remove items that are both is comp and compto # and find items that are only in comp for item in comp_copy[:]: try: compto_copy.remove(item) # Only is removed if the item is in compto_copy comp_copy.remove(item) except ValueError: # dicts need to be handled differently if isinstance(item, dict): continue if print_errors: print_error(item, 'NOT_FOUND', path) errors += 1 # Find non-dicts that are only in compto for item in compto_copy[:]: if isinstance(item, dict): continue compto_copy.remove(item) if print_errors: print_error('NOT_FOUND', item, path) errors += 1 # Now both copies only have dicts # This is the part that compares dicts with the minimum # errors between them, it is expensive since each dict in comp_copy # has to be compared against each dict in compto_copy for c in comp_copy: lowest_errors = None lowest_value = None for ct in compto_copy: errors_in = deep_check(c, ct, path, print_errors=False) # Get and store the minimum errors if errors_in < lowest_errors or lowest_errors is None: lowest_errors = errors_in lowest_value = ct if lowest_errors is not None: errors += lowest_errors # Has to have print_errors passed incase the list of dicts # contains a list of dicts deep_check(c, lowest_value, path, print_errors) compto_copy.remove(lowest_value) return errors if not isinstance(compto, dict): # If the types are not the same then it is probably a critical error # return a number to represent how important this is if print_errors: print_error(comp, 'NOT_DICT') return 1 for key, value in compto.iteritems(): try: comp[key] except KeyError: if print_errors: print_error('NO_KEY', key, copy_append(path, key)) errors += 1 for key, value in comp.iteritems(): try: tovalue = compto[key] except KeyError: if print_errors: print_error(value, 'NOT_FOUND', copy_append(path, key)) errors += 1 continue if isinstance(value, (list, dict)): errors += deep_check(value, tovalue, copy_append(path, key), print_errors) else: if value != tovalue: if print_errors: print_error(value, tovalue, copy_append(path, key)) errors += 1 return errors

With your voice recorders as input, I get:

 MATCH LIST ITEM: Person>>Age EXPECTED: 60 ACTUAL: NOT_FOUND MATCH LIST ITEM: Person>>Movies EXPECTED: {'The Last Day': {'Director': 'Mr. Kapoor', 'Year': 1990}, 'Monster': {'Director': 'Mr. Khanna', 'Year': 1991}} ACTUAL: NOT_FOUND MATCH LIST ITEM: Person>>Children>>Lname EXPECTED: Roshan ACTUAL: Losan MATCH LIST ITEM: Person>>Children>>Children EXPECTED: Kamal ACTUAL: NOT_FOUND MATCH LIST ITEM: Person>>Children>>Children EXPECTED: NOT_FOUND ACTUAL: Ajamal MATCH LIST ITEM: Person>>Children>>Children EXPECTED: Parveen ACTUAL: NOT_FOUND

Comparison of path lists updated to these two lists:

 ['foo', 'bar'] ['foo', 'bing', 'bar']

The only error that occurs is that "bing" is not on the first list. With string values, the value can either be in a list or not, but the problem arises when you compare a list of dicts. As a result, you will get voice recorders from the list that do not correspond to different degrees, and the knowledge that dictations are compared with them is not direct.

My implementation solves this by assuming that the dicts pairs that create the fewest errors are the ones that need to be compared with each other. For instance:

 test1 = { "Name": "Org Name", "Members": [ { "Fname": "foo", "Lname": "bar", "Gender": "Neuter", "Roles": ["President", "Vice President"] }, { "Fname": "bing", "Lname": "bang", "Gender": "Neuter", "Roles": ["President", "Vice President"] } ] } test2 = { "Name": "Org Name", "Members": [ { "Fname": "bing", "Lname": "bang", "Gender": "Male", "Roles": ["President", "Vice President"] }, { "Fname": "foo", "Lname": "bar", "Gender": "Female", "Roles": ["President", "Vice President"] } ] }

Produces this conclusion:

 MATCH LIST ITEM: Members>>Gender EXPECTED: Neuter ACTUAL: Female MATCH LIST ITEM: Members>>Gender EXPECTED: Neuter ACTUAL: Male

Deep check on two python dictionaries and get the difference in report form

Dict1

Dict2

More articles: