Efficient list display in python

I have the following input:

input = [(dog, dog, cat, mouse), (cat, ruby, python, mouse)] 

and trying to get the following output:

 outputlist = [[0, 0, 1, 2], [1, 3, 4, 2]] outputmapping = {0:dog, 1:cat, 2:mouse, 3:ruby, 4:python, 5:mouse} 

Any advice on how to handle data with scalability in mind (the var tab can be very large).

+4
source share
4 answers

You probably want something like:

 import collections import itertools def build_catalog(L): counter = itertools.count().next names = collections.defaultdict(counter) result = [] for t in L: new_t = [ names[item] for item in t ] result.append(new_t) catalog = dict((name, idx) for idx, name in names.iteritems()) return result, catalog 

Using it:

 >>> input = [('dog', 'dog', 'cat', 'mouse'), ('cat', 'ruby', 'python', 'mouse')] >>> outputlist, outputmapping = build_catalog(input) >>> outputlist [[0, 0, 1, 2], [1, 3, 4, 2]] >>> outputmapping {0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'} 
+6
source

This class automatically maps objects to integer values:

 class AutoMapping(object): def __init__(self): self.map = {} self.objects = [] def __getitem__(self, val): if val not in self.map: self.map[val] = len(self.objects) self.objects.append(val) return self.map[val] 

Example usage, for input:

 >>> input = [('dog', 'dog', 'cat', 'mouse'), ('cat', 'ruby', 'python', 'mouse')] >>> map = AutoMapping() >>> [[map[x] for x in y] for y in input] [[0, 0, 1, 2], [1, 3, 4, 2]] >>> map.objects ['dog', 'cat', 'mouse', 'ruby', 'python'] >>> dict(enumerate(map.objects)) {0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'} 
+2
source

Here is one possible solution, although it is not the biggest. This can be done a little more efficiently if you know how many elements each entry in the list will have a preliminary selection.

 labels=[]; label2index={}; outputlist=[]; for group in input: current=[]; for label in group: if label not in label2index: label2index[label]=len(labels); labels.append(label); current.append(label2index[label]); outputlist.append(current); outputmapping={}; for idx, val in enumerate(labels): outputmapping[idx]=val; 
0
source

I had the same problem quite often in my projects, so I recruited a class some time ago that does just that:

 class UniqueIdGenerator(object): """A dictionary-like class that can be used to assign unique integer IDs to names. Usage: >>> gen = UniqueIdGenerator() >>> gen["A"] 0 >>> gen["B"] 1 >>> gen["C"] 2 >>> gen["A"] # Retrieving already existing ID 0 >>> len(gen) # Number of already used IDs 3 """ def __init__(self, id_generator=None): """Creates a new unique ID generator. `id_generator` specifies how do we assign new IDs to elements that do not have an ID yet. If it is `None`, elements will be assigned integer identifiers starting from 0. If it is an integer, elements will be assigned identifiers starting from the given integer. If it is an iterator or generator, its `next` method will be called every time a new ID is needed.""" if id_generator is None: id_generator = 0 if isinstance(id_generator, int): import itertools self._generator = itertools.count(id_generator) else: self._generator = id_generator self._ids = {} def __getitem__(self, item): """Retrieves the ID corresponding to `item`. Generates a new ID for `item` if it is the first time we request an ID for it.""" try: return self._ids[item] except KeyError: self._ids[item] = self._generator.next() return self._ids[item] def __len__(self): """Retrieves the number of added elements in this UniqueIDGenerator""" return len(self._ids) def reverse_dict(self): """Returns the reversed mapping, ie, the one that maps generated IDs to their corresponding items""" return dict((v, k) for k, v in self._ids.iteritems()) def values(self): """Returns the list of items added so far. Items are ordered according to the standard sorting order of their keys, so the values will be exactly in the same order they were added if the ID generator generates IDs in ascending order. This hold, for instance, to numeric ID generators that assign integers starting from a given number.""" return sorted(self._ids.keys(), key = self._ids.__getitem__) 

Usage example:

 >>> input = [(dog, dog, cat, mouse), (cat, ruby, python, mouse)] >>> gen = UniqueIdGenerator() >>> outputlist = [[gen[x] for x in y] for y in input] [[0, 0, 1, 2], [1, 3, 4, 2]] >>> print outputlist >>> outputmapping = gen.reverse_dict() >>> print outputmapping {0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'} 
0
source

Source: https://habr.com/ru/post/1308232/


All Articles