Python: getting elements from a set

In general, Python collections do not seem to be designed to extract items with a key. Obviously, what dictionaries are for. But is there anyway that, given the key, you can get an instance from the set that is equal to the key?

Again, I know that this is exactly what dictionaries are for, but as far as I can see, there are legitimate reasons to want to do this with a set. Suppose you have a specific class:

class Person: def __init__(self, firstname, lastname, age): self.firstname = firstname self.lastname = lastname self.age = age 

Now suppose that I am going to create a large number of Person objects, and every time I create a Person object, I need to make sure that it is not a duplicate of the previous Person object. A Person is considered a duplicate of another Person if it has the same firstname , regardless of other instance variables. Thus, it is natural that you need to do everything to insert all Person objects into the set and define the __hash__ and __eq__ so that Person objects are compared by their firstname .

An alternative would be to create a dictionary of Person objects and use the first created firstname string as the key. The disadvantage here is that I will duplicate the firstname string. In most cases, this is not a problem, but what if I have 10,000,000 Person objects? The backup row storage can really be offset in terms of memory usage.

But if two Person objects are compared the same way, I need to get the source object so that additional instance variables (except firstname ) can be combined in the way that business logic requires. This brings me back to my problem: I need a way to retrieve instances from set .

Is there anyway to do this? Or uses the dictionary only as a real option?

+6
source share
3 answers

I would definitely use the dictionary here. Reusing the instance variable firstname as a dictionary key will not copy it - the dictionary will simply use the same object. I doubt that the dictionary will use much more memory than typing.

To actually save memory, add the __slots__ attribute to your classes. This will prevent each of you from having 10,000,000 copies of the __dict__ attribute, which will save a lot more memory than the potential overhead of dict compared to set .

Change Some numbers to confirm my requirements. I defined a stupid example class that stores pairs of random strings:

 def rand_str(): return str.join("", (chr(random.randrange(97, 123)) for i in range(random.randrange(3, 16)))) class A(object): def __init__(self): self.x = rand_str() self.y = rand_str() def __hash__(self): return hash(self.x) def __eq__(self, other): return self.x == other.x 

The amount of memory used by a set of 1,000,000 instances of this class

 random.seed(42) s = set(A() for i in xrange(1000000)) 

is on my machine 240 MB. If I add

  __slots__ = ("x", "y") 

for a class, it is reduced to 112 MB. If I store the same data in a dictionary

 def key_value(): a = A() return ax, a random.seed(42) d = dict(key_value() for i in xrange(1000000)) 

it uses 249 MB without __slots__ and 121 MB with __slots__ .

+8
source

Yes, you can do it: A set can be repeated. But note that this is an O (n) operation as opposed to an O (1) dict operation.

So, you need to trade speed and memory. This is a classic. I will personally optimize here (for example, use a dictionary), since memory will not be reduced so quickly, there are only 10,000,000 objects, and using dictionaries is very simple.

As for the additional memory consumption for the firstname string: since the strings are not changed in Python, assigning the firstname attribute as a key will not create a new string, but simply copy the link.

+3
source

I think you will have the answer:

Moving out of the factory in Python

+1
source

Source: https://habr.com/ru/post/887938/


All Articles