Python: best way to split a list of objects by field?

I have a list of objects named person with identifier and their countries:

class Person(object):
    def __init__(self, id, country):
        self.id = str(id)
        self.country = str(country)

The list is as follows: id is only the UUID, and the country is the country code, I sorted them by country:

('7e569521-69fe-4ccf-a898-254bd758bff0', 'AF')
('c6b45478-6901-4a22-aab8-7167397d4b13', 'AF')
('15aee743-a1b1-4a77-b93b-17786c8c8fab', 'AF')
('7ef1efd3-6b77-4dfe-b133-035eff76d7f6', 'AF')
('95880e05-9984-48e3-a60a-0cf52c2915ae', 'AG')
('620862a0-e888-4b20-8057-085122226050', 'AL')
('ed0caf58-e132-48ad-bfca-8a4df2b0c351', 'AL')
('730cf6ba-0981-4a0b-878e-5df0ebedaa99', 'AM')
('93f87a3d-d618-4e9a-9f44-4a1d0bc65bdc', 'AM')

Now I would like to divide them into different lists by country.

Here is what I am doing now:

prev_country = ""
person_data_country = []

for person in persons_data:

    if prev_country != person.country:
        if len(person_data_country) > 0:
            # do something with this new list by country

            # clear them    
            person_data_country = []

    # append item to new list
    person_data_country.append(person)
    prev_country = person.country

# last list, if any
if len(person_data_country) > 0:
    # do something with this new list by country

I get what I want with the codes above.

But I would like to know if there is a better or more efficient way to split the list by country?

+4
source share
3 answers

You can use itertools.groupby( https://docs.python.org/3.6/library/itertools.html#itertools.groupby ) to achieve the desired:

from itertools import groupby
grouped_data = groupby(persons_data, key=lambda x: x[1])  # or x.country, depending on your input list
for country, items in grouped_data:
    # do whatever you want

There are a few mistakes to keep in mind:

  • groupby , .
  • items . , .
+4

itertools.groupby. persons_data , , :

import itertools
import operator

bycountry = operator.attrgetter("country")

all_people_by_country = []

for country, groupiter in itertools.groupby(persons_data, bycountry):
    all_people_by_country.append(list(groupiter))
+2

Another approach to consider, if I understand you correctly:

from collections import defaultdict
persons = [
    Person('one', 'AF'),
    Person('two', 'AF'),
    Person('three', 'AG')
]
persons_by_country = defaultdict(list)
for person in persons:
    persons_by_country[person.country].append(person.id)

Or, if you want to avoid defaultdictfor any reason,

persons_by_country = {}
for person in persons:
    if person.country in persons_by_country:
        persons_by_country[person.country].append(person.id)
    else:
        persons_by_country[person.country] = [person.id]

In any case, the result will be:

{'AG': ['three'], 'AF': ['one', 'two']}

The main disadvantage is that all data is stored in memory twice.

+1
source

Source: https://habr.com/ru/post/1676322/


All Articles