Make sure Python dicts has the same form and keys

For a single dicts layer, such as x = {'a': 1, 'b': 2} , the problem is simple and answers SO ( Pythonic to check if two dictionaries have the same set of keys? ), But what about the nested ones ? dicts?

For example, y = {'a': {'c': 3}, 'b': {'d': 4}} has the keys 'a' and 'b' , but I want to compare its shape with another nested dict structure such as z = {'a': {'c': 5}, 'b': {'d': 6}} , which has the same shape and keys (different values โ€‹โ€‹in order) like y . w = {'a': {'c': 3}, 'b': {'e': 4}} will have the keys 'a' and 'b' , but on the next layer in it it is different from y , therefore that w['b'] has the key 'e' and y['b'] has the key 'd' .

You want a short / simple function of two arguments dict_1 and dict_2 and return True if they have the same shape and key, as described above, and False otherwise.

+6
source share
3 answers

This provides a copy of both dictionaries, devoid of any values โ€‹โ€‹other than the dictionary, then compares them:

 def getshape(d): if isinstance(d, dict): return {k:getshape(d[k]) for k in d} else: # Replace all non-dict values with None. return None def shape_equal(d1, d2): return getshape(d1) == getshape(d2) 
+7
source

I liked the nneonneo answer and it should be relatively fast, but I need something that did not create unnecessary data structures (I learned about memory fragmentation in Python). It may or may not be so quick or fast.

(EDIT: Spoiler!)

Most likely, a decent margin is enough to make it preferable in all cases, see another analytical answer.

But if you deal with many and many of them and have memory problems, it will probably be preferable to do it this way.

Implementation

This should work in Python 3, possibly 2.7 if you translate keys into viewkeys , definitely not 2.6. It relies on a key type representation that indicates:

 def sameshape(d1, d2): if isinstance(d1, dict): if isinstance(d2, dict): # then we have shapes to check return (d1.keys() == d2.keys() and # so the keys are all the same all(sameshape(d1[k], d2[k]) for k in d1.keys())) # thus all values will be tested in the same way. else: return False # d1 is a dict, but d2 isn't else: return not isinstance(d2, dict) # if d2 is a dict, False, else True. 

Modifying the update to reduce redundant type checking is now even more efficient.

Testing

To check:

 print('expect false:') print(sameshape({'foo':{'bar':{None:None}}}, {'foo':{'bar':{None: {} }}})) print('expect true:') print(sameshape({'foo':{'bar':{None:None}}}, {'foo':{'bar':{None:'foo'}}})) print('expect false:') print(sameshape({'foo':{'bar':{None:None}}}, {'foo':{'bar':{None:None, 'baz':'foo'}}})) 

Print

 expect false: False expect true: True expect false: False 
+5
source

To view the two currently existing answers, first import timeit :

 import timeit 

Now we need to install the code:

 setup = ''' import copy def getshape(d): if isinstance(d, dict): return {k:getshape(d[k]) for k in d} else: # Replace all non-dict values with None. return None def nneo_shape_equal(d1, d2): return getshape(d1) == getshape(d2) def aaron_shape_equal(d1,d2): if isinstance(d1, dict) and isinstance(d2, dict): return (d1.keys() == d2.keys() and all(aaron_shape_equal(d1[k], d2[k]) for k in d1.keys())) else: return not (isinstance(d1, dict) or isinstance(d2, dict)) class Vividict(dict): def __missing__(self, key): value = self[key] = type(self)() return value d = Vividict() d['foo']['bar'] d['foo']['baz'] d['fizz']['buzz'] d['primary']['secondary']['tertiary']['quaternary'] d0 = copy.deepcopy(d) d1 = copy.deepcopy(d) d1['primary']['secondary']['tertiary']['extra'] # d == d0 is True # d == d1 is now False! ''' 

Now let's check two options: first with Python 3.3!

 >>> timeit.repeat('nneo_shape_equal(d0, d); nneo_shape_equal(d1,d)', setup=setup) [36.784881490981206, 36.212246977956966, 36.29759863798972] 

And it looks like my solution takes from 2/3 to 3/4 times, which makes it more than 1.25 times faster.

 >>> timeit.repeat('aaron_shape_equal(d0, d); aaron_shape_equal(d1,d)', setup=setup) [26.838892214931548, 26.61037168605253, 27.170253590098582] 

And in Python 3.4 (alpha), which I compiled myself:

 >>> timeit.repeat('nneo_shape_equal(d0, d); nneo_shape_equal(d1,d)', setup=setup) [272.5629618819803, 273.49581588001456, 270.13374400604516] >>> timeit.repeat('aaron_shape_equal(d0, d); aaron_shape_equal(d1,d)', setup=setup) [214.87033835891634, 215.69223327597138, 214.85333003790583] 

Still, about the same ratio. The time difference between them is likely, because I independently compiled 3.4 without optimization.

Thanks to all the readers!

0
source

Source: https://habr.com/ru/post/970773/


All Articles