How to implement auto-verification for a nested dictionary ONLY when assigning values?

Question

How to implement auto-verification for a nested dictionary ONLY when assigning values?

TL DR
How can I get superkeys to be autovivified in a Python type when assigning values to subkeys, without forcing them to autowire when checking connections?

Background: Usually in Python, setting values in a nested dictionary requires manually ensuring that keys are at a higher level before being assigned to sub-keys. I.e

my_dict[1][2] = 3

will not work reliably as planned before doing something like

 if 1 not in my_dict: my_dict[1] = {}

Now you can create a kind of auto-vivification by making my_dict instance of a class that overrides __missing__ , as shown, for example. at https://stackoverflow.com>

Question: However, this solution silently generates higher-level keys if you check for the presence of a sub-key in such a nested dict. This leads to the following misfortune:

 >>> vd = Vividict() >>> 1 in vd False >>> 2 in vd[1] False >>> 1 in vd True

How can I avoid this misleading result? In Perl, by the way, I can get the desired behavior by doing

 no autovivification qw/exists/;

And basically, I would like to reproduce this behavior in Python, if possible.

+5

python dictionary autovivification

J. Lerman Feb 08 '17 at 20:03

source share

2 answers

kindall · Answer 1 · 2017-02-08T20:07:42+0000

This is not an easy task to solve, because in your example:

 my_dict[1][2] = 3

my_dict[1] leads to a call to __getitem__ in the dictionary. At this point, there is no way to know that the task is in progress. Only the last [] in the sequence is a __setitem__ call, and it cannot be successful unless mydict[1] exists, because otherwise what object do you assign?

Therefore, do not use auto-processing. Instead, you can use setdefault() , with a regular dict .

 my_dict.setdefault(1, {})[2] = 3

Now that it's not quite pretty, especially when you nest more deeply, you can write a helper method:

 class MyDict(dict): def nest(self, keys, value): for key in keys[:-1]: self = self.setdefault(key, {}) self[keys[-1]] = value my_dict = MyDict() my_dict.nest((1, 2), 3) # my_dict[1][2] = 3

But it’s even better to include this in the new __setitem__ , which immediately accepts all indexes, instead of requiring intermediate __getitem__ calls that cause auto-processing. Thus, we know from the very beginning that we are doing the job and can continue without relying on auto-processing.

 class MyDict(dict): def __setitem__(self, keys, value): if not isinstance(keys, tuple): return dict.__setitem__(self, keys, value) for key in keys[:-1]: self = self.setdefault(key, {}) dict.__setitem__(self, keys[-1], value) my_dict = MyDict() my_dict[1, 2] = 3

To ensure consistency, you can also provide __getitem__ , which accepts keys in a tuple as follows:

 def __getitem__(self, keys): if not isinstance(keys, tuple): return dict.__getitem__(self, keys) for key in keys: self = dict.__getitem__(self, key) return self

The only drawback that I can think of is that we cannot use tuples as dictionary keys as easily: we have to write it down like, for example. my_dict[(1, 2),] .

dhke · Answer 2 · 2017-02-08T21:24:23+0000

The correct answer: do not do this in Python, since explicit is better than implicit.

But if you really want autovivitation to not contain empty sub-dictionaries, you can emulate behavior in Python.

 try: from collections import MutableMapping except: from collections.abc import MutableMapping class AutoDict(MutableMapping, object): def __init__(self, *args, **kwargs): super(AutoDict, self).__init__() self.data = dict(*args, **kwargs) def __getitem__(self, key): if key in self.data: return self.data.__getitem__(key) else: return ChildAutoDict(parent=self, parent_key=key) def __setitem__(self, key, value): return self.data.__setitem__(key, value) def __delitem__(self, key): return self.data.__delitem__(key) def __iter__(self): return self.data.__iter__() def __len__(self): return self.data.__len__() def keys(self): return self.data.keys() def __contains__(self, key): return data.__contains__(key) def __str__(self): return str(self.data) def __unicode__(self): return unicode(self.data) def __repr__(self): return repr(self.data) class ChildAutoDict(AutoDict): def __init__(self, parent, parent_key): super(ChildAutoDict, self).__init__() self.parent = parent self.parent_key = parent_key def __setitem__(self, key, value): if self.parent is not None and not self.parent_key in self.parent: # if parent got a new key in the meantime, # don't add ourselves self.parent.data[self.parent_key] = self else: self.parent = None return self.data.__setitem__(key, value) def __delitem__(self, key): ret = self.data.__delitem__(key) # only remove ourselves from the parent if we are # still occupying our slot. if not self and self.parent and self is self.parent[parent_key]: self.parent.data.pop(self.parent_key) return ret

The fact that you will return from __getitem__() , in fact, is a fan of the dictionary, which adds itself to the parent dictionary only if it is not empty and deletes itself when it becomes empty.

All this - of course - stops working as soon as you assign a "normal" dictionary somewhere in the middle, i.e. d[2] = {} , d[2][3] = {} no longer works, etc.

I have not tested this very carefully, so beware of more bugs.

 d = AutoDict() print(1 in d) >>> False print(d) >>> {} print(d[2][3]) >>> {} print(d[2]) >>> {} print(d) >>> {} d[2][3] = 1 print(d) >>> {2: {3: 1}} del d[2][3] print(d) >>> {}

How to implement auto-verification for a nested dictionary ONLY when assigning values?

More articles: