Python: dilemma dictionary: how to index objects based on attribute correctly

Question

Python: dilemma dictionary: how to index objects based on attribute correctly

first, an example:

given a bunch of Person objects with various attributes (name, ssn, phone, email address, credit card number, etc.)
Now imagine the following simple site:
uses the person’s email address as a unique login
Allows users to edit their attributes (including their email address).
if there were many users on this website, then it makes sense to store the identity of objects in a dictionary, indexed email address, for a quick person to search when entering the system.
however, when the email address of Person is edited, then the dictionary key is for that Person to be changed as Well. it's a little yucky

im looking for suggestions on how to solve a common problem:

given a bunch of entities with a common aspect. this aspect is used both for quick access to entities, and within each entity functionality. where should the aspect be placed:

inside each object (not suitable for quick access)
index only (not suitable for each entity functionality)
both inside each object and as an index (duplicate data / links)
somewhere else / otherwise

the problem can be expanded, say, if we want to use several indexes for indexing data (ssn, credit card number, etc.). we end up with a bunch of SQL tables.

im looking for something with the following properties (and more if you can think of them):

# create an index on the attribute of a class magical_index = magical_index_factory(class, class.attribute) # create an object obj = class() # set the object attribute obj.attribute= value # retrieve object from using attribute as index magical_index[value] # change object attribute to new value obj.attribute= new_value # automagically object can be retrieved using new value of attribute magical_index[new_value] # become less materialistic: get rid of the objects in your life del obj # object is really gone magical_index[new_value] KeyError: new_value

I want the object, the indices, all to play beautifully and smoothly with each other.

suggest appropriate design patterns

Note: the above example is an example. An example used to describe a common problem. so please provide general solutions (of course, you can continue to use the example when explaining your general solution)

+4

python dictionary design-patterns data-structures indexing

bandana Feb 21 '10 at 12:07

source share

2 answers

Well, another way could be this:

Attr is an abstraction for "meaning". We need this because Python does not have a “destination overload” (the simple clean get / set paradigm is used as the cleanest alternative). Attr also acts as an “Observed”.
AttrSet is an “observer” for Attr s, which tracks their changes in values, effectively acting as an Attr -to-whatever dictionary ( person in our case).
create_with_attrs is a factory that creates what looks like a named tuple, accessing the forwarding attribute provided by Attr s, so person.name = "Ivan" effectively gives person.name_attr.set("Ivan") and makes an observation AttrSet is person name appropriately reorders their insides.

Code (tested):

 from collections import defaultdict class Attribute(object): def __init__(self, value): super(Attribute, self).__init__() self._value = value self._notified_set = set() def set(self, value): old = self._value self._value = value for n_ch in self._notified_set: n_ch(old_value=old, new_value=value) def get(self): return self._value def add_notify_changed(self, notify_changed): self._notified_set.add(notify_changed) def remove_notify_changed(self, notify_changed): self._notified_set.remove(notify_changed) class AttrSet(object): def __init__(self): super(AttrSet, self).__init__() self._attr_value_to_obj_set = defaultdict(set) self._obj_to_attr = {} self._attr_to_notify_changed = {} def add(self, attr, obj): self._obj_to_attr[obj] = attr self._add(attr.get(), obj) notify_changed = (lambda old_value, new_value: self._notify_changed(obj, old_value, new_value)) attr.add_notify_changed(notify_changed) self._attr_to_notify_changed[attr] = notify_changed def get(self, *attr_value_lst): attr_value_lst = attr_value_lst or self._attr_value_to_obj_set.keys() result = set() for attr_value in attr_value_lst: result.update(self._attr_value_to_obj_set[attr_value]) return result def remove(self, obj): attr = self._obj_to_attr.pop(obj) self._remove(attr.get(), obj) notify_changed = self._attr_to_notify_changed.pop(attr) attr.remove_notify_changed(notify_changed) def __iter__(self): return iter(self.get()) def _add(self, attr_value, obj): self._attr_value_to_obj_set[attr_value].add(obj) def _remove(self, attr_value, obj): obj_set = self._attr_value_to_obj_set[attr_value] obj_set.remove(obj) if not obj_set: self._attr_value_to_obj_set.pop(attr_value) def _notify_changed(self, obj, old_value, new_value): self._remove(old_value, obj) self._add(new_value, obj) def create_with_attrs(**attr_name_to_attr): class Result(object): def __getattr__(self, attr_name): if attr_name in attr_name_to_attr.keys(): return attr_name_to_attr[attr_name].get() else: raise AttributeError(attr_name) def __setattr__(self, attr_name, attr_value): if attr_name in attr_name_to_attr.keys(): attr_name_to_attr[attr_name].set(attr_value) else: raise AttributeError(attr_name) def __str__(self): result = "" for attr_name in attr_name_to_attr: result += (attr_name + ": " + str(attr_name_to_attr[attr_name].get()) + ", ") return result return Result()

With data prepared using

 name_and_email_lst = [("John"," email1@dot.com "), ("John"," email2@dot.com "), ("Jack"," email3@dot.com "), ("Hack"," email4@dot.com "), ] email = AttrSet() name = AttrSet() for name_str, email_str in name_and_email_lst: email_attr = Attribute(email_str) name_attr = Attribute(name_str) person = create_with_attrs(email=email_attr, name=name_attr) email.add(email_attr, person) name.add(name_attr, person) def print_set(person_set): for person in person_set: print person print

The following sequence of pseudo-SQL fragments yields:

SELECT id FROM email

 >>> print_set(email.get()) email: email3@dot.com , name: Jack, email: email4@dot.com , name: Hack, email: email2@dot.com , name: John, email: email1@dot.com , name: John,

SELECT id FROM email WHERE email = " email1@dot.com "

 >>> print_set(email.get(" email1@dot.com ")) email: email1@dot.com , name: John,

SELECT id FROM email WHERE email = " email1@dot.com " OR email = " email2@dot.com "

 >>> print_set(email.get(" email1@dot.com ", " email2@dot.com ")) email: email1@dot.com , name: John, email: email2@dot.com , name: John,

SELECT id FROM name WHERE name = "John"

 >>> print_set(name.get("John")) email: email1@dot.com , name: John, email: email2@dot.com , name: John,

SELECT id FROM name, email WHERE name = "John" AND email = " email1@dot.com "

 >>> print_set(name.get("John").intersection(email.get(" email1@dot.com "))) email: email1@dot.com , name: John,

ADD email, name SET email = " jon@dot.com ", name = "Jon"

WHERE id IN

SELECT id FROM email WHERE email = " email1@dot.com "

 >>> person = email.get(" email1@dot.com ").pop() >>> person.name = "Jon"; person.email = " jon@dot.com " >>> print_set(email.get()) email: email3@dot.com , name: Jack, email: email4@dot.com , name: Hack, email: email2@dot.com , name: John, email: jon@dot.com , name: Jon,

REMOVE FROM email, name WHERE id =% s

SELECT id FROM email

 >>> name.remove(person) >>> email.remove(person) >>> print_set(email.get()) email: email3@dot.com , name: Jack, email: email4@dot.com , name: Hack, email: email2@dot.com , name: John,

0

mlvljr Feb 23 '10 at 12:40

source share

S. Lott · Accepted Answer · 2010-02-21T12:17:43+0000

Consider this.

 class Person( object ): def __init__( self, name, addr, email, etc. ): self.observer= [] ... etc. ... @property def name( self ): return self._name @name.setter def name( self, value ): self._name= value for observer in self.observedBy: observer.update( self ) ... etc. ...

This observer attribute implements the Observable , which notifies about updates of the Observers . This is a list of observers who should be notified of changes.

Each attribute is wrapped in properties. Using Descriptors , we are probably better, because it can keep repeating observer notification.

 class PersonCollection( set ): def __init__( self, *args, **kw ): self.byName= collections.defaultdict(list) self.byEmail= collections.defaultdict(list) super( PersonCollection, self ).__init__( *args, **kw ) def add( self, person ): super( PersonCollection, self ).append( person ) person.observer.append( self ) self.byName[person.name].append( person ) self.byEmail[person.email].append( person ) def update( self, person ): """This person changed. Find them in old indexes and fix them.""" changed = [(k,v) for k,v in self.byName.items() if id(person) == id(v) ] for k, v in changed: self.byName.pop( k ) self.byName[person.name].append( person ) changed = [(k,v) for k,v in self.byEmail.items() if id(person) == id(v) ] for k, v in changed: self.byEmail.pop( k ) self.byEmail[person.email].append( person) ... etc. ... for all methods of a collections.Set.

Use the .ABC compilations for more information on what needs to be implemented.

http://docs.python.org/library/collections.html#abcs-abstract-base-classes

If you need "general" indexing, then your collection can be parameterized with attribute names, and you can use getattr to get these named attributes from base objects.

 class GenericIndexedCollection( set ): attributes_to_index = [ ] # List of attribute names def __init__( self, *args, **kw ): self.indexes = dict( (n, {}) for n in self.attributes_to_index ] super( PersonCollection, self ).__init__( *args, **kw ) def add( self, person ): super( PersonCollection, self ).append( person ) for i in self.indexes: self.indexes[i].append( getattr( person, i )

Note. To properly emulate a database, use a collection, not a list. Database tables are (theoretically) sets. As a practical matter, they are disordered, and the index will allow the database to reject duplicates. Some RDBMSs do not reject duplicate rows because - without an index - it is too expensive to check.

Python: dilemma dictionary: how to index objects based on attribute correctly

More articles: